New issue
Advanced search Search tips

Issue 881141 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 884331

Blocking:
issue 853518



Sign in to add a comment

tune mojo datapipe used for blob and service worker script reading

Project Member Reported by wanderview@chromium.org, Sep 6

Issue description

We have some reports that reading a response that was provided from cache_storage is slower than a response that is loaded from networking's http cache.  I asked mek@ about it today and he pointed out that networking tunes its mojo datapipe to a 512kb buffer.  It looks like blob reading is probably using the default 64kb buffers.

Also, it looks like the blob reading code fills the entire pipe buffer before notifying the renderer that any data is available.  This could result in some ping-pong behavior between the processes with reduced parallelism.  To address this we can also tune the blob reading code to use read in chunk sizes smaller than the pipe buffer.

This bug will involve adding code to tune these values for blob reading and then perform some finch experiments to see if they improve performance measures.
 
I did some initial experiments.  I populated a cache_storage entry with a Response of different sizes and then ran this in the console:

  var c = await caches.open("foo");
  var results = [];
  async function doTest() { r = await c.match("/"); var start = performance.now(); var t = await r.text(); results.push(performance.now() - start); }
  for (let i = 0; i < 20; ++i) { await doTest(); }
  results.join("\n")

I then put the results in a spreadsheet for both our current default settings and a datapipe tuned to 512kb buffer with 32kb chunks:

https://docs.google.com/spreadsheets/d/12VVrsY8zWVNhgbHo1C8-7QDMu6Y_sxvH4RElA4OIluA/edit?usp=sharing

On my VM instance there are no statistically significant difference for smaller response bodies.  At larger sizes like 255MB, though, there is a very strong improvement.  Read times drop from a median of 2.4 seconds to 1.5 seconds.

Seems worth pursuing.
  
This synthetic stress test:

  https://cache-api-perf.glitch.me/parallel.html?jobs=20&entries=100&type=path&bodies=20&reports=20

Shows this result for a current default on my VM:

matchesPerSec - min: 1737 mean: 1972 max: 2264
bodyMegaBytesPerSec - min: 110 mean: 124 max: 137

With the 512kb pipe buffer and 32kb chunk size:

matchesPerSec - min: 503 mean: 628 max: 670
bodyMegaBytesPerSec - min: 169 mean: 209 max: 223

It seems we are able to better saturate read rate with the tuned values.  In this test this results in less time for cache matching, so the matchesPerSec drops.
Cc: falken@chromium.org shimazu@chromium.org bashi@chromium.org
Components: Blink>ServiceWorker
Summary: tune mojo datapipe used for blob and service worker script reading (was: tune mojo datapipe used for blob reading)
This same problem seems to affect service worker script loading as well:

https://cs.chromium.org/chromium/src/content/browser/service_worker/service_worker_installed_script_reader.cc?l=121&rcl=b1b8eb459ad1d34ffe35818f1229c14dc0018628

I'll look at tuning that here as well unless the service worker team would prefer to do it in a separate bug.  Loading these scripts should be very similar to reading scripts out of cache_storage blobs.
Another optimization we can make here is to create and start filling the response DataPipe immediately in the service worker thread:

https://chromium-review.googlesource.com/c/chromium/src/+/1214684

In the current code the service worker sends the GotResponse() mojo message back to the main thread.  The main thread then starts to load the blob.  If the main thread is busy with js then the start of the blob reading can be delayed for long periods.  By starting the read immediately we can at least fill the DataPipe if the main thread is janking.
Cc: kinuko@chromium.org
Note the CL in #c5 basically disables side data reading.  I'm not sure the best way to support that.  It would be a lot easier if it was read via a DataPipe instead of a separate message that produced a buffer callback.

Comment 8 Deleted

(re-posting from my chromium account)
re c#5: I'm a bit worried that reading the blob data in the worker thread potentially makes service worker thread busier. Instead of reading it in the service worker thread, ServiceWorkerSubresourceLoader (which dispatches a FetchEvent to the service worker thread on a background thread) might be able to call ReadAll on the background thread and send the datapipe back to the main thread.
Blocking: 882425
I split #c5 into a separate bug.  See  bug 882425 .
Blocking: -882425
Project Member

Comment 13 by bugdroid1@chromium.org, Sep 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8

commit 25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8
Author: Ben Kelly <wanderview@chromium.org>
Date: Thu Sep 13 20:29:39 2018

Tune the mojo::DataPipe capacity and read chunk size when reading blobs.

We have reports from some sites that data loaded from cache_storage via
a service worker is slower than when loaded from http cache.  One possible
explanation is that the mojo::DataPipes used for reading the BlobDataHandle
from cache_storage are not tuned like the DataPipes used by the network
loading path.  In the network loading path they use 512KB pipes and read in
32KB chunks.

This CL attempts to tune the various DataPipes used to read BlobDataHandle
objects.  It also includes some DataPipes in the service worker code that
are read in a similar way; for loading scripts, ReadableStreams, etc.

This CL uses 512KB pipes and 64KB read chunks as the default as those were
the best values in local testing.  It also adds features to control these
values in a future finch trial.  We can compare different values against
heartbeat metrics and site-specific metrics.

R=kinuko@chromium.org, mek@chromium.org, rkaplow@chromium.org

Bug:  881141 
Change-Id: If8d956ad195c6cf85547f1c7172f4a35972513e4
Reviewed-on: https://chromium-review.googlesource.com/1211739
Reviewed-by: Robert Kaplow <rkaplow@chromium.org>
Reviewed-by: Marijn Kruisselbrink <mek@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: Ben Kelly <wanderview@chromium.org>
Cr-Commit-Position: refs/heads/master@{#591135}
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/content/browser/cache_storage/cache_storage_blob_to_disk_cache.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/content/browser/service_worker/service_worker_installed_script_reader.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/content/common/service_worker/service_worker_loader_helpers.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/storage/DEPS
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/storage/browser/blob/mojo_blob_reader.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/testing/variations/fieldtrial_testing_config.json
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/third_party/blink/common/blob/blob_utils.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/third_party/blink/public/common/blob/blob_utils.h
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/third_party/blink/renderer/core/fetch/form_data_bytes_consumer.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/third_party/blink/renderer/core/fileapi/file_reader_loader.cc
[modify] https://crrev.com/25b9a8e66efcad49b254d4c0a1b09cb9a9c8a6b8/third_party/blink/renderer/modules/service_worker/fetch_respond_with_observer.cc

Note, these changes mainly help in the SW s13n mode.  The legacy mode uses a different blob loading approach which gets the larger data pipes already.
Blockedon: 884331
I need to investigate the performance regression next week.  Maybe the test is doing something that unusual or maybe there is really just increased overhead from creating larger data pipes.

If its the later, maybe I can make a CL that will create the data pipes with either exactly the right capacity for the target blob or the feature-configured capacity if the blob is too big.  This would be better for resource usage as well, so I should probably do this anyway.
📍 Couldn't reproduce a difference.
https://pinpoint-dot-chromeperf.appspot.com/job/14d78b1f640000
Status: Fixed (was: Assigned)
Marking this fixed for now.  If we do a finch trial to further tune the values I will open a new launch bug for the trial.

Sign in to add a comment