New issue
Advanced search Search tips

Issue 856921 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug



Sign in to add a comment

Large CSS/JS files download speed slow on Chromium only

Reported by pikachu...@gmail.com, Jun 27 2018

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36

Example URL:
https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-3429eed7918e4940829d5cfe724e13f9110114f853111b6f9c49add67d36b0aa.css

Steps to reproduce the problem:
1. Launch Chrome
2. Load https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-3429eed7918e4940829d5cfe724e13f9110114f853111b6f9c49add67d36b0aa.css
3. Feel the slowness

What is the expected behavior?
The transfer time should be 200ms ~ 500ms on average

What went wrong?
Transfer time always > 500ms, if requested directly it can take up to seconds

Did this work before? N/A 

Chrome version: 67.0.3396.99  Channel: stable
OS Version: OS X 10.12.6
Flash Version: 

According to Cloudflare support response: 
```
I was able to reproduce the issue you described and it seems that the issue relates to how Chrome downloads the content and saves it to cache rather than a bottleneck at Cloudflare.

This can be checked from Chrome's net-internals (chrome://net-internals) page describing the request and how it's handling the response which I've attached in this ticket. Instead of downloading the entire content from the server then cache the response it seems to be doing it on the fly.
```

See attached screenshots for timings & request/response headers from different browsers including
- Chrome 67
- Firefox 61
- Safari 11.1.1

The normal transfer time can also be confirmed with curl

```
$ curl 'https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-3429eed7918e4940829d5cfe724e13f9110114f853111b6f9c49add67d36b0aa.css' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'Upgrade-Insecure-Requests: 1' -H 'DNT: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3472.3 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en' -H 'Cookie: __cfduid=d4ca69e8aaeff5c23321cc96d9e836c0a1530068899' -w "           Content Type: %{content_type}\n              HTTP Code: %{http_code}\n           HTTP Connect: %{http_connect}\n        Number Connects: %{num_connects}\n       Number Redirects: %{num_redirects}\n           Redirect URL: %{redirect_url}\n          Size Download: %{size_download}\n Average Speed Download: %{speed_download}\n            Size Upload: %{size_upload}\n             SSL Verify: %{ssl_verify_result}\n         Time Handshake: %{time_appconnect}\n           Time Connect: %{time_connect}\n       Name Lookup Time: %{time_namelookup}\nn
     Pretransfer: %{time_pretransfer}\n          Time Redirect: %{time_redirect}\n    Time Start Transfer: %{time_starttransfer}\n                        ----------\n             time_total:  %{time_total}\n\nEffective URL: %{url_effective}\n" --output /dev/null

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  223k    0  223k    0     0  1370k      0 --:--:-- --:--:-- --:--:-- 1370k
           Content Type: text/css
              HTTP Code: 200
           HTTP Connect: 000
        Number Connects: 1
       Number Redirects: 0
           Redirect URL:
          Size Download: 228711
 Average Speed Download: 1403134.000
            Size Upload: 0
             SSL Verify: 0
         Time Handshake: 0.081291
           Time Connect: 0.017964
       Name Lookup Time: 0.014509

     Pretransfer: 0.081376
          Time Redirect: 0.000000
    Time Start Transfer: 0.092065
                        ----------
             time_total:  0.163514

Effective URL: https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-3429eed7918e4940829d5cfe724e13f9110114f853111b6f9c49add67d36b0aa.css
```

Events in chrome://net-internals/#events can be reproduced
So I don't post the event log here
 
chrome_request_headers.png
80.6 KB View Download
chrome_response_headers.png
50.9 KB View Download
chrome_timings.png
15.9 KB View Download
firefox_61_request_headers.png
20.5 KB View Download
firefox_61_response_headers.png
51.2 KB View Download
firefox_61_timings.png
9.9 KB View Download
safari_11_1_1_request_headers.png
72.4 KB View Download
safari_11_1_1_response_headers.png
38.0 KB View Download
safari_11_1_1_timings.png
10.6 KB View Download

Comment 1 by mmenke@chromium.org, Jun 27 2018

Cc: morlovich@chromium.org
Hrm...  We get the entire response body in the first 100 milliseconds, and start decompressing it and sending the body along as soon as we get it, however, at some later point, we delay the request for 1 second without anything trying to read from the body of the request, so the issue looks to be at a higher layer than the network stack.

[+morlovich]:  Looks like you're on triage today.  This looks like a tractable issue that may well be worth understanding.

[pikachuexe]:  Do you only see the issue with that particular CSS resource as a top-level page, or do you also see the issue with that particular resource when it's actually embedded used in a page as CSS?
Status: Assigned (was: Unconfirmed)
OK, well, that's one way to triage it :D
I think there were some earlier reports where there seemed to be slowness at higher level with a navigation; something like opening a big JSON?

Comment 3 by mmenke@chromium.org, Jun 27 2018

Status: Untriaged (was: Assigned)
I added you to the CC list, didn't assign it to you.  I mean, feel free to assign it to yourself if you want, just figure you're on triage, this reproduces, has a big impact in at least one case, so seems like a good thing to look into during your shift.
Cc: -morlovich@chromium.org
Owner: morlovich@chromium.org
Status: Assigned (was: Untriaged)
Yeah, I assigned it to me, with the bonus of removing it from my triage queue, and putting it on my work queue.

I am not sure I'll actually figure anything out, but it's worth a shot.
Both direct requests and sub-resource requests are both slower than other non-Chromium browsers & curl
Directly requests take more time than sub-resource requests (about 10x) to transfer the content

I attached timings and net internal logs for 
- direct request with "Disable cache" enabled
- sub-resource request with "Disable cache" enabled
- direct request with "Disable cache" disabled (request still sent with 304 response)
- sub-resource request with "Disable cache" disabled (directly read from disk cache)
direct_access_from_disk_cache_net_internals.txt
21.5 KB View Download
direct_access_from_disk_cache_timings.png
35.2 KB View Download
direct_access_no_cache_net_internals.txt
41.7 KB View Download
direct_access_no_cache_timings.png
34.8 KB View Download
sub_resource_from_disk_cache_net_internals.txt
15.0 KB View Download
sub_resource_from_disk_cache_timings.png
36.0 KB View Download
sub_resource_no_cache_net_internals.txt
164 KB View Download
sub_resource_no_cache_timings.png
36.5 KB View Download
So subresource case is bad (600-900ms) or so, and the direct one is horrible (8 seconds!). What was the subresource setup as, just <script src=>, or something fancier?

Still looks like some higher-level-than-network-stack stuff, though. The disk cache level logs all look pretty nice...

```
<link rel=preload as=style href="//cdn.spacious.hk/{css_path}" media=all onload="this.onload=null;this.rel=&#39;stylesheet&#39;"/>
<noscript>
<link rel=stylesheet media=all href="//cdn.spacious.hk/{css_path}"/>
</noscript>
```

With fallback script (which should not be run on Chrome) from 
https://github.com/filamentgroup/loadCSS

But JS file loaded via script tag with async & defer is also slow
```
<script src="//cdn.spacious.hk/{js_path}" debug=false crossorigin=anonymous async defer></script> 
```
chrome_js_from_disk_cache_timings.png
37.6 KB View Download
Let me know if there is anything else is needed
(More logs / test with some dev version etc.)
Well, I can't reproduce the subresource case with just above, so if you have a handy test page that shows the problem, it would be helpful.

(I can reproduce the top-level one, so I'll dig into that, but it's likely to be less important).

The top level one may well just be slowness in formatting a really long string without any whitespace for display - I'm not even sure it's worth digging into.
That does seem likely to me as well (given that my debug builds are about 3x slower at it, too!), but I figured it would be a good excuse to look around... but then on further thought, navigation and resource paths are pretty different, aren't they? 
Yea, they are.  If you want to dig around the navigation code to better understand this case, don't let my comment stop you - I think it's good code to know, since it's rather network-stack-adjacent.
So in the top-level case, from browser side we're just stopped by flow control at MojoAsyncResourceHandler level.... time to learn what the renderer side peer to that is.

This is much easier to see if I make URLRequest::LogBlockedBy log everything and not just first spot by commenting out this: https://cs.chromium.org/chromium/src/net/url_request/url_request.cc?rcl=aa72414951c47a474b12f873e6aed8ca677a4057&l=338

And it does seem likely that rendering is a huge factor --- if I switch away from that tab to net-internals, it renders in "only" about 3 sec rather than about 21, and flow control pauses look like this:
t=208808 [st= 216]    DELEGATE_INFO  [dt=6]
                      --> delegate_blocked_by = "MojoAsyncResourceHandler"

rather than more like this:
t= 9407 [st= 6226]    DELEGATE_INFO  [dt=574]
                      --> delegate_blocked_by = "MojoAsyncResourceHandler"


As resource load it looks OKish --- 100ms is bigger than it should be, but it's quite possibly just because I am on a slow debug build. 
Yep, ~14-20ms on the subresource (from cache) in an optimized build, maybe it could be improved but in the not obviously brokenly slow range; so I will probably need a testcase...
And it does look like the slow-top-level loading was indeed due to rendering (though it's not as slow as reported in an optimized build) ---- the debug output I've added is reasonably compelling:
[1:1:0709/155320.867204:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155320.876633:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155320.876827:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 0.148 ms
[1:1:0709/155320.878606:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155320.880169:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155320.881106:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155320.881322:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.087516:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 206.218 ms
[1:1:0709/155321.088540:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.088846:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.143878:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 55.019 ms
[1:1:0709/155321.215692:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.215906:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.261556:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 45.683 ms
[1:1:0709/155321.262881:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.263131:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.299000:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 35.887 ms
[1:1:0709/155321.300314:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.300630:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.334622:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 34.005 ms
[1:1:0709/155321.336240:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.336583:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.375002:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 38.462 ms
[1:1:0709/155321.376762:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE
[1:1:0709/155321.376991:ERROR:local_frame_view.cc(718)] 0x3fff12047d30 Begin LocalFrameView::PerformLayout
[1:1:0709/155321.401373:ERROR:local_frame_view.cc(786)] 0x3fff12047d30 End LocalFrameView::PerformLayout 24.426 ms
[1:1:0709/155321.402991:ERROR:url_response_body_consumer.cc(160)] 0x3cd9ae427c00 OnReceivedData:65536 sent to:N7content16WebURLLoaderImpl15RequestPeerImplE

=> The main thread is doing lots of re-layouts, which are kinda expensive, and so it's not reading from mojo.

Anyway, the renderer people may care about that (or quite possibly already know about it), but seems like the subresource case is what we in network should indeed focus on,
so a testcase that demonstrates it would be very much appreciate it, as I can't reproduce it by pasting the css in comment #7 into a file (with proper URL, of course)

I test again for direct access while that tab is inactive (looking at a blank tab when it's loading)
And the content download time seems more normal now (~300ms)

Also it seems the rendering also affects the content download time for sub-resource requests (~200ms longer)

But it's still strange that the content download time for both requests from CDN and from file cache are similar (~500ms)

Is the file cache too slow? Or the other operations for displaying the webpage is slowing down both requests?

I am testing with https://www.spacious.hk/en/hong-kong



direct_access_no_cache_tab_inactive_timings.png
35.8 KB View Download
sub_resource_from_disk_cache_tab_active_timings.png
36.4 KB View Download
sub_resource_from_disk_cache_tab_inactive_timings.png
35.5 KB View Download
sub_resource_from_disk_cache_tab_inactive_timings.txt
15.1 KB View Download
sub_resource_no_cache_tab_active_timings.png
36.6 KB View Download
sub_resource_no_cache_tab_inactive_timings.png
37.0 KB View Download
sub_resource_no_cache_tab_inactive_timings.txt
144 KB View Download
I test again for direct access while that tab is inactive (looking at a blank tab when it's loading)
And the content download time seems more normal now (~300ms)

Also it seems the rendering also affects the content download time for sub-resource requests (~200ms longer)

But it's still strange that the content download time for both requests from CDN and from file cache are similar (~500ms)

Is the file cache too slow? Or the other operations for displaying the webpage is slowing down both requests?

I am testing with https://www.spacious.hk/en/hong-kong

direct_access_no_cache_tab_inactive_timings.png
35.8 KB View Download
sub_resource_from_disk_cache_tab_active_timings.png
36.4 KB View Download
sub_resource_from_disk_cache_tab_inactive_timings.png
35.5 KB View Download
sub_resource_from_disk_cache_tab_inactive_timings.txt
15.1 KB View Download
sub_resource_no_cache_tab_active_timings.png
36.6 KB View Download
sub_resource_no_cache_tab_inactive_timings.png
37.0 KB View Download
sub_resource_no_cache_tab_inactive_timings.txt
144 KB View Download
Thanks, that I can reproduce (well, at least on stable package, dev tree doing other stuff at the moment)... 
And while our cache isn't great, I think the slowness is above that --- for example the netlog for this says, 
among other things, this:

t=207763 [st=  75]    HTTP_CACHE_READ_DATA  [dt=8]
t=207771 [st=  83]    URL_REQUEST_JOB_BYTES_READ
                      --> byte_count = 32768
t=207771 [st=  83]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 6686
t=207949 [st= 261]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=207949 [st= 261]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208095 [st= 407]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208096 [st= 408]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208251 [st= 563]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208251 [st= 563]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208376 [st= 688]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208376 [st= 688]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208618 [st= 930]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208618 [st= 930]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208683 [st= 995]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=208683 [st= 995]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=209209 [st=1521]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=209209 [st=1521]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=209536 [st=1848]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 32768
t=209536 [st=1848]    URL_REQUEST_JOB_FILTERED_BYTES_READ
                      --> byte_count = 11415

st = here is time in milliseconds since start of request.
This means that it took about 8ms to get a chunk of data (happens to be 32K) out of the cache.
Then brotli decompressed it to a very impressive amount over a bunch of reads from upper layer...
spread over about 1.8 seconds! I can't say why the reads are so spread out w/o a hacked up
dev build, though.

So it does look to me like, to use your words "the other operations for displaying the webpage is slowing down both requests", though in this case there is too much stuff going on in the renderer for me to know for sure (especially since I am not familiar with that area).  Basically it looks like while the big stylesheet is still loading it also starts loading all the various ads scripts, social API widgets, starts executing them, etc., and it slows down the render's consumption of the CSS from the browser as those things take up CPU and compete for transferring data. 

Also, is the CSS supposed to be (when uncompressed) about 3.2MiB? 

(Dev note: it looks to just be woken up by mojo::SimpleWatcher, and it consumes data from the pipe immediately... should priorities matter there?)

We care more about transfer size than uncompressed size for static assets
So we ignored the 3.2MiB size

Going to move some not so important async script tag to bottom of body and test again


Tried again after moving most JS (except the app essential JS) to bottom of body
There is still a long delay between some `ENTRY_READ_DATA` entries under `DISK_CACHE_ENTRY`

See the text file with events logged during a caching enabled, tab inactive during loading request (removed events after CSS is loaded, like AJAX requests where t > 4817)

CSS URL is https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-51285caf07cfc1ab2ae3cff6fa15a20b75e3b0e12797419506fd414a1098030d.css
sub_resource_no_cache_tab_inactive_timings_all_events.txt
376 KB View Download
It might help to be recording chrome traces (view using chrome://tracing) to do more debugging. Here's one with a 1.8s css load (https://cdn.spacious.hk/assets/apps/spacious_website/render_unblocking/phase_2-51285caf07cfc1ab2ae3cff6fa15a20b75e3b0e12797419506fd414a1098030d.css)

As you can see, the main renderer thread is completely busy for the entire duration of the request, mostly running javascript. To me this indicates that we're not blocked on network but by other activity on the site.

If you want to try to log one of these yourself, navigate to chrome://tracing and hit record. You can probably also manually curate categories, but "Web developer" might be good enough.
trace_spacious_longcss.json.gz
5.9 MB Download
Components: Blink>Loader
Owner: ----
Status: Available (was: Assigned)
=> Loader, as the render's ability to consume data from the browser, rather than browser's ability to fetch it appears to be the issue.
Components: -Internals>Network
This is all renderer-side.  Deferring to the loader team.
Submitting my trace for web page load
trace_trace_spacious_webpage_load.json.gz
3.1 MB Download
Is there any more info should be provided to help the investigation?
Please let me know if there is any progress or more details is needed
Again, if there is any progress or any other info/testing is required
Please let me know
We found out that switch back to sync stylesheet tag make the loading speed normal again
So we decided to use sync stylesheet tag for chromium based browsers only
(like Chrome, Opera, Vivaldi)
Trying to add more details here

We use Chrome Timeline with option "Disable Cache" Unchecked
And we record a profile after finishing one page load (so that cachced content should be used)

We found that there are TWO Send Request events with different priority
And the first "request" has less "Receive Data" events with bigger Encoded Data size
The latter "request" still has many "Receive Data" events with Encoded Data size of 64KB / 32KB

So this indicates there is something strange going on even when loading cached content

See screenshots and saved profile in attachments
preload__send_request.jpg
6.5 KB View Download
preload__receive_data.jpg
6.0 KB View Download
preload__finish_loading.jpg
6.0 KB View Download
actual_usage__send_request.jpg
6.2 KB View Download
actual_usage__receive_data_64k.jpg
6.5 KB View Download
actual_usage__receive_data_32k.jpg
6.1 KB View Download
actual_usage__finish_loading.jpg
5.9 KB View Download
Profile-20181031T161852.json.zip
2.7 MB Download

Sign in to add a comment