New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 709144 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 706515



Sign in to add a comment

Possible performance limitation for uploads over QUIC

Project Member Reported by ckrasic@chromium.org, Apr 6 2017

Issue description

Lack of buffering QuicChromiumClientStream::WriteStreamData()  might be hurting uploads performance

As discussed with rch@.

Contrasting with TCP which has a kernel socket buffer, and associated thresholds for accepting new data into the buffer.

Chromium QUIC code has a pretty minimal amount of buffering.  Specifically QuicChromiumClientStream::WriteStreamData() returns ERR_IO_PENDING the first time WriteOrBufferData() indicates data was buffered. It's quite likely that the size of data blocks supplied to WriteStreamData() are small relative to a typical TCP kernel socket buffer. 

In the TCP case, the buffer serves to reduce user/kernel mode switches, saving CPU.  In the QUIC case, there is an analog in thread switches between source of the data (e.g. UploadDataStream) and the network thread. Also since UploadDataStream uses a fairly naive scheme of alternating between reads and writes, the lack of buffering in QUIC could limit the ability of UploadDataStream to amortize the impact of slow reads from storage. 


 
How big should the buffer be?  Two options come to mind:

a) just big, say same size we allow for receive windows?
b) buffer as much as the current flow control window will allow.
For multiple reasons, I've been wanting to move data ownership(the equivalent of a kernel send buffer) into the QUIC streams.  This would allow larger writes from applications, which are typically more efficient, and could even enable zero-copy( prior to crypto).

This would mean StreamFrames no longer own data, which would remove one extra copy and allocation from the send path and would be great for the future of multipath.

So ideally I'd like to fix this on the client and server side, since I believe it can benefit both.
For buffering, I was thinking the overall buffer would be enough for any outstanding unacked data + the size of the last application write.  ie: we'd consume whatever data the application had for us all at once, and then tell it we're full.  I think this is a bit like TCP lowat, but with a very small low water mark?
As rch@ pointed out, this line of code is pertinent:

https://codesearch.chromium.org/chromium/src/net/quic/chromium/quic_http_stream.cc?rcl=e6a1fd7a7f2fc50b1fa4311ebd75d43195d69d00&l=294

So, it would seem that ~14KB is the upper bound currently on data buffered in the QUIC write path.
Early results, it seems like this can improve throughput of CPU bound uploads by about 20%.

Based on uploading a 5GiB file to drive from my workstation, 5 uploads each at buffer sizes of 10, 128, 256, 512, 1024 * kMaxSizePacket.

In these tests, the best improvement was reached with 256 * kMaxSizePacket.

The same code space likely has another issue if/when there are storage latencies that would benefit from more overlap with network writes.  Adding a kind of watermark to trigger stream writes earlier would help there.   



Comment 6 by rch@chromium.org, Apr 7 2017

Excellent progress! 

As a straw-man for the overlap, I would suggest we invoke callback_ from OnWrite() when the buffer drops below 50%. (Not necessarily 50% of capacity, but 50% of whatever it was at when it was last filled)

Comment 7 by jri@chromium.org, Apr 7 2017

Awesome find!
Sounds like great progress.

Can you summarize what the differences in bandwidth were at different sizes?  Can you include 64 as a size as well?
Average Bandwidths (Mbps) were: 286, 337, 356, 339, 339

I'll get numbers for 64x tomorrow.

Did some more runs, including 64x.

64x did quite well, managing average 343 Mpbs, and 17% speedup over 10x.

64x and above consistently outperform the status quo 10x by at least 15%.   The gains beyond 64x have a good deal of variance.  256 looks like the sweet spot, but not with a lot of confidence.

Comment 11 by mef@chromium.org, Apr 11 2017

Blocking: 706515
Project Member

Comment 12 by bugdroid1@chromium.org, Jul 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d4619709908bcfb3746813b656ff80dc242a9388

commit d4619709908bcfb3746813b656ff80dc242a9388
Author: Charles 'Buck' Krasic <ckrasic@chromium.org>
Date: Fri Jul 21 21:52:55 2017

QUIC - increase buffer size to improve upload performance.

Increase the size of QuicHttpStream::raw_request_body_buf_ based on the
size of the body.  For large bodies and fast networks, this give a nice
reduction in CPU (or boost throughput if CPU bound) (~10-20%).

R=rch@chromium.org

Bug: 709144
Change-Id: If4fc69f9f89866b75b6d09d728e53a7bfa2054e4
Reviewed-on: https://chromium-review.googlesource.com/581877
Reviewed-by: Buck Krasic <ckrasic@chromium.org>
Reviewed-by: Ryan Hamilton <rch@chromium.org>
Commit-Queue: Buck Krasic <ckrasic@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488765}
[modify] https://crrev.com/d4619709908bcfb3746813b656ff80dc242a9388/net/quic/chromium/quic_http_stream.cc

Sign in to add a comment