New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 889653 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Fetch large files in parallel via multiple TCP streams

Project Member Reported by vadimsh@chromium.org, Sep 26

Issue description

There were multiple reports of extremely slow CIPD file fetches from non-ideal networks. At the same time gsutil worked pretty well.

There are two significant differences between gsutil and cipd:
1. gsutil opens N TCP streams and fetches the file from multiple offsets in parallel (CIPD doesn't).
2. gsutil uses python's TCP library (and likely HTTP 1.1), cipd is in Go (and uses HTTP 2.0).

Ideally, we should learn to reproduce this behavior (perhaps using a crappy network simulator) and then try to fix it.

One concern is that with HTTP 2.0 there's a high chance "multiple" HTTP streams will in fact share one TCP connection :)
 
From one case (bay area):

"cipd was seemingly peaking out around 500kb/s, the link would quite happily saturate at 20mb/s going direct to gcs"
Cc: raggi@google.com iannucci@chromium.org
Downloading files using Chrome from exact same signed URLs as used by CIPD produce better speed.

(Presumably Chrome doesn't parallelize requests).

CIPD uses io.Copy(...) when reading from TCP connection. Maybe 32Kb buffer used by io.Copy is a limiting factor here.
This is reproducible using e.g. "Very Bad Network" profile of OSX's Network Link Conditioner. Fetching 8 MB via Chrome takes 1:30, via CIPD - more than twice longer. I'll try different experiments now.
'curl' is as slow as CIPD (== 2x slower than Chrome). I suspect Chrome does parallel download too (with 2 streams).
Project Member

Comment 5 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/c1d2d308e2ee636d2ddd5d2f09f888c02b65e22a

commit c1d2d308e2ee636d2ddd5d2f09f888c02b65e22a
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Sep 27 00:48:47 2018

[cipd] Add simple download speed reporter.

Looks like this:

[P50311 17:31:45.574 client.go:1380 I] cipd: resolving fetch URL for ...
[P50311 17:31:45.637 storage.go:308 I] cipd: initiating the fetch
[P50311 17:31:46.223 storage.go:275 I] cipd: about to fetch 935.2 MB
[P50311 17:31:46.223 storage.go:256 I] cipd: fetching - 0% (?? KB/s)
[P50311 17:31:51.225 storage.go:256 I] cipd: fetching - 2% (5524.9 KB/s)
[P50311 17:31:56.227 storage.go:256 I] cipd: fetching - 6% (6502.5 KB/s)
[P50311 17:32:01.228 storage.go:256 I] cipd: fetching - 9% (6469.5 KB/s)
...

R=iannucci@chromium.org, nodir@chromium.org
BUG=889653

Change-Id: Ifa441d5c6d768c617095840928210e3aa0a6fa28
Reviewed-on: https://chromium-review.googlesource.com/1247581
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/c1d2d308e2ee636d2ddd5d2f09f888c02b65e22a/cipd/client/cipd/client.go
[modify] https://crrev.com/c1d2d308e2ee636d2ddd5d2f09f888c02b65e22a/cipd/client/cipd/storage.go

I've confirmed with aria2c tool that downloading file in multiple (4) streams indeed speeds up download 2x on "Very Bad Network" profile, compared to a single stream (like cipd does).

On good WiFi network it has no effect (slightly negative in fact). Haven't tried on a high speed ethernet connection yet.

One huge downside of the parallel fetch is that we'll have to do separate pass over the file to calculate its SHA256, since we won't be able to do it on the fly when fetching. For large files, it may be quite noticeable (reading 1GB from disk is no joke).

So on fast networks, it would be more efficient to stick with 1 stream and on-the-fly hash calculation.

On slow networks, fetching in ~4 streams is faster.

Detecting network speed is not really trivial, and takes time (if using some probes). One option is to start fetching in 1 stream, and then spawn more if the speed is less than X. This is a bit complicated to implement though :(
Another avenue for parallelization is to fetch different packages in parallel. They are currently fetched sequentially, with assumption that single stream is able to saturate the network link (which turned out to be wrong on slow networks).

Downsides:
1. Won't help when fetching 1 package. This happens quite often (e.g. we often update only 1 software component, not all of them at once).
2. (Speculation) Writing to multiple files in parallel may decrease disk IO.
Owner: vadimsh@chromium.org
Status: Assigned (was: Untriaged)
Cc: iannu...@google.com
Cc: -iannucci@chromium.org
I hit this again this evening and did some more investigation.

From home I get better performance with a wired machine than a wifi machine, but in both cases they achieve well below line rate.

Chrome achieves line rate against the same downloads.

curl with tcp fast open and http2 forced on achieves line rate about 10% of the time.

Starting to suspect there's some configuration issue at the GCS side that is making this issue more pronounced.

Good runs >300mbps. Bad runs <500kbps.

Sign in to add a comment