New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 828665 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 2
Type: Bug-Regression



Sign in to add a comment

31ms startup performance regression in net::ct::STHDistributor::NewSTHObserved()

Project Member Reported by wittman@chromium.org, Apr 3 2018

Issue description

Data from the UMA Sampling Profiler shows that net::ct::STHDistributor::NewSTHObserved() regressed during browser process IO thread startup on 64-bit Chrome on Windows by 31ms. This occurred in the 67.0.3383.0 Canary release.

The responsible CL appears to be https://chromium.googlesource.com/chromium/src.git/+/5c442f87659b835707a5bdc9ed2e68915186d26e

Execution profile difference of Canary 67.0.3382.0 vs. 67.0.3383.0: https://uma.googleplex.com/p/chrome/callstacks?sid=c876a12d4a690da59713df7f32edab52
 
This appears to be due to changes in the release rather than the associated Finch experiment.

We also see the effects on Mac: https://uma.googleplex.com/p/chrome/callstacks?sid=82dc489d53d8d61114ea769d03c3a6f5
I expect this is because TreeStateTracker is no longer discarding all STHs it receives, but instead is passing them to a SingleTreeTracker which verifies their signatures. This shouldn't need to happen during startup though. The STHDistributor should be changed to delay its work until after startup.
The 31ms are indeed being spent verifying the STH signatures. This is being done on the I/O thread; is there another way of scheduling this task so that it doesn't have an effect on startup?

For now, I'll disable verifying the STH signatures if the CertificateTransparencyLogAuditing feature isn't enabled, since they aren't used for anything in that case.
@robpervical: 31ms is surprisingly expensive. We should look into exploring that more, but yes, we should also get that off the IO thread. Working through the threading requirements will be a bit more complex because of the interactions. Separately, I had been looking at moving the SCT parsing and validation onto the certificate worker thread and off the IO thread, so this may be aligned in that regard.
To be clear, the 31ms number is average time spent executing this function during the first 30 seconds of startup, over all clients and invocations.

Breaking down the average (see graph), there's a small number of users at high percentiles that are spending quite a long time executing this function. The 99.9th percentile is at 1.8 seconds.
NewSTHObservedPercentiles.png
72.7 KB View Download
The profiler says the cost is from ECDSA verification, which is not the fastest thing in the world, but it is already an operation we do on the I/O thread in several places. My laptop measures it at 3.7k ops/sec in the OPENSSL_SMALL build configuration that Chrome uses, which is 100x faster than the 31ms being measured here.

Are you all verifying a bunch of signatures or just one?

We have another build configuration that can do 8.5k ops/sec, but that costs a 148KiB of binary size. We may be able to get somewhere in between without the binary size hit, but the upper bound for that is 2.3x, and it sounds like there's a bigger discrepancy between the benchmarks and what we're observing.
> We may be able to get somewhere in between

I should note that the "somewhere in between" would only help x86_64 since it depends on assembly. Probably the better question is why the benchmark discrepancy.

PS: Ed25519 is much nicer than ECDSA P-256. Probably a bit late to redo CT using that, but maybe worth investing in now. We currently hit 14k ops/sec on Ed25519 even without giant tables or assembly.
If this is expected to be fast but for some users it's really slow, is it possible this is due to some third party hooking?
If this code was hooked we would expect the hook implementation functions to show up in the profiles as unsymbolized frames, but we don't see that here.
Right now, it'll be verifying 27 signatures (one for each of the CT logs in https://cs.chromium.org/chromium/src/net/data/ssl/certificate_transparency/log_list.json).
27 signatures at 3.7kops/sec should take 7ms, which is still better than 31ms, but that's close enough that differences between machines seems more plausible.
What is this verifying on startup? Is it static data, or do you have STH's on-disk that you want to check?
We get STHs every 24 hours. The verifying on startup is not an intentional thing though, and may be due to the component being loaded from disk, notifying about STHs available, and starting verification early during process startup. We can defer that though.
Project Member

Comment 14 by bugdroid1@chromium.org, Apr 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eb780fd434496084168f4aade94cfb4e571d5225

commit eb780fd434496084168f4aade94cfb4e571d5225
Author: Rob Percival <robpercival@chromium.org>
Date: Fri Apr 13 17:43:25 2018

Do not create TreeStateTracker if kCTLogAuditing feature is disabled

This avoids distributing STHs and SCTs to it and spending time verifying STH signatures
when they won't be used for anything.

Bug:  828665 
Change-Id: I8cb6b67fcd12d9e4f875331d1eae9c44d9e74925
Reviewed-on: https://chromium-review.googlesource.com/1009913
Reviewed-by: Ryan Sleevi <rsleevi@chromium.org>
Commit-Queue: Rob Percival <robpercival@chromium.org>
Cr-Commit-Position: refs/heads/master@{#550675}
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/chrome/browser/io_thread.cc
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/chrome/browser/io_thread_browsertest.cc
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/components/certificate_transparency/tree_state_tracker.cc

With the previous change we see a 50% decrease in aggregate runtime in STHDistributor::NewSTHObserved() in 68.0.3397.0: https://uma.googleplex.com/p/chrome/callstacks?sid=9e0b3fb0e169d93125c459d570673ca9
Cc: gab@chromium.org fdoray@chromium.org
gab/fdoray: Could you advise on best practice here? I couldn't find good documentation for the new scheduler for problems of this sort, but found some possible leads.

The behaviour we're talking about in this bug is related to https://chromium.googlesource.com/chromium/src/+/be8138a4eb66bddb5bca81e08c13c6cb1dafc981/chrome/browser/component_updater/sth_set_component_installer.h

In particular, the ComponentReady() triggers a call to LoadSTHsFromDisk, which loads the STHs, then calls OnJsonParseSuccess, and posts the loaded STH to the IOThread, which does the STH verification, which is where this perf time is spent.

1) Is the WeakPtr usage with how ComponentReady PostTasks to LoadSTHsFromDisk safe? I wasn't sure with all the scheduler changes whether any invariants are violated because of the BACKGROUND affinity, since the weak_ptr_factory is initially created on a higher priority sequence (... I'm hoping it is)
2) I'm guessing the recommended practice would be to use BrowserThread::PostAfterStartupTask. Is that the desired method?
3) Since it requires an explicit task runner, and doesn't allow task traits to be passed (AFAICT), is SequencedTaskRunnerHandle::Get() as the task runner, bound to a WeakPtr that then chains to a method which does PostTask to LoadSTHsFromDisk with BACKGROUND affinity the 'right' solution?
Project Member

Comment 17 by bugdroid1@chromium.org, Apr 17 2018

Labels: merge-merged-testbranch
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eb780fd434496084168f4aade94cfb4e571d5225

commit eb780fd434496084168f4aade94cfb4e571d5225
Author: Rob Percival <robpercival@chromium.org>
Date: Fri Apr 13 17:43:25 2018

Do not create TreeStateTracker if kCTLogAuditing feature is disabled

This avoids distributing STHs and SCTs to it and spending time verifying STH signatures
when they won't be used for anything.

Bug:  828665 
Change-Id: I8cb6b67fcd12d9e4f875331d1eae9c44d9e74925
Reviewed-on: https://chromium-review.googlesource.com/1009913
Reviewed-by: Ryan Sleevi <rsleevi@chromium.org>
Commit-Queue: Rob Percival <robpercival@chromium.org>
Cr-Commit-Position: refs/heads/master@{#550675}
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/chrome/browser/io_thread.cc
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/chrome/browser/io_thread_browsertest.cc
[modify] https://crrev.com/eb780fd434496084168f4aade94cfb4e571d5225/components/certificate_transparency/tree_state_tracker.cc

1) WeakPtrs are not safe to use from different sequences. base::PostTaskWithTraits(FROM_HERE, {...}, base::BindOnce(..., weak_ptr_factory_.GetWeakPtr())) is always incorrect, because the WeakPtr is obtained from the posting sequence and dereferenced when the task is scheduled in the thread pool, on no specific sequence.

The recommended solution is to post a background task that doesn't access external state (i.e. all the state it needs is passed by value to the callback) and handle the result of that task in a reply callback that runs in the origin sequence. E.g.:

base::PostTaskWithTraitsAndReplyWithResult(
  FROM_HERE,
  {base::MayBlock(), base::TaskPriority::BACKGROUND},
  base::BindOnce(&LoadSTHsFromDisk, arg1, arg2), // not bound to a weak ptr
  base::BindOnce(&STHSetComponentInstallerPolicy::HandleLoadSTHsFromDiskResult,
                 weak_ptr_factory_.GetWeakPtr()) 
                // bound to a weak ptr, but that's ok because it runs on the origin sequence 
);

2) Using a BACKGROUND priority reduces perf impact on more important work, but it's not perfect yet. Posting to an AfterStartupTaskUtils::Runner initialized with a CreateTaskRunnerWithTraits({base::TaskPriority::BACKGROUND}) TaskRunner is the best you can do to reduce impact on more important work.

E.g.

auto tr = AfterStartupTaskUtils::Runner(CreateTaskRunnerWithTraits({base::TaskPriority::BACKGROUND}));
base::PostTaskAndReplyWithResult(tr.get(), FROM_HERE, ...);

3) The right solution is described in 2).
Thanks! Super-helpful, so I added you to https://chromium-review.googlesource.com/c/chromium/src/+/1015518 to get your feedback :)
Project Member

Comment 20 by bugdroid1@chromium.org, Apr 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/013c7af6505fc9bd7341b92f770b80d3888f7af0

commit 013c7af6505fc9bd7341b92f770b80d3888f7af0
Author: Ryan Sleevi <rsleevi@chromium.org>
Date: Thu Apr 19 02:52:40 2018

Defer STH loading until after startup

The STHSet component receives periodic updates of new STHs from known
logs. The process for the update involves receiving the component,
loading the files from disk, then notifying the IO thread of the
received STHs. On the IO thread, it verifies the STHs against the log's
public key, and if valid, processes that STH.

On a fresh install, this process will normally be well after the browser
has started. However, on a browser restart, this may trigger during
the browser startup path, as components are loaded.

As a result, defer the processing of the files from disk until after the
browser has started up, to reduce IO contention.

In addition, the current threading patterns involved three task runners - the
UI thread (which installs the components), the background thread doing
the file IO, and the IO thread validating the STHs. However, the
threading patterns weren't safe - primarily for situations of shutdown
in which the IO thread is in the process of being destructed. This
rearranges the thread sequencing to ensure that all objects have
defined, safe lifetimes, by changing the ownership model for the
STHObserver and hiding the threading details as an internal
implementation detail.

BUG=834008,  828665 

Change-Id: Iea850b4ee3348b9339faea52e14f11965709a5b7
Reviewed-on: https://chromium-review.googlesource.com/1015518
Commit-Queue: Ryan Sleevi <rsleevi@chromium.org>
Reviewed-by: Sorin Jianu <sorin@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Reviewed-by: Eric Roman <eroman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#551927}
[modify] https://crrev.com/013c7af6505fc9bd7341b92f770b80d3888f7af0/chrome/browser/component_updater/sth_set_component_installer.cc
[modify] https://crrev.com/013c7af6505fc9bd7341b92f770b80d3888f7af0/chrome/browser/component_updater/sth_set_component_installer.h
[modify] https://crrev.com/013c7af6505fc9bd7341b92f770b80d3888f7af0/chrome/browser/component_updater/sth_set_component_installer_unittest.cc

Status: Fixed (was: Assigned)
Marking this as Fixed as of that last commit - wittman@, feel free to reopen if you see further regressions, but we should hopefully see an improvement for the other 50% (i.e. those opted in to the log auditing trial) based on the new setup, which allows for interleaving the work and also deferring it until after startup.

Sign in to add a comment