New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 642082 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: ----



Sign in to add a comment

Optimize DER cache for memory

Project Member Reported by dskiba@chromium.org, Aug 29 2016

Issue description

I'm working on reducing amount of duplicate memory in Clank, and noticed that GetDER is responsible for 840 KiB of duplicate data (i.e. at least half of that is duplicate) in a test where 22 URLs are opened in 10 sec intervals in a single tab. In more usual scenarios like "open theverge.com, click on an article" GetDER is responsible for 96 KiB. Duplicates are found by periodically hashing all allocated regions, see here: https://groups.google.com/a/chromium.org/d/msg/project-trim/EiKXwnbTvbo/0z-Y5Kg1CgAJ

So I instrumented GetDER and DERCache to understand how much duplicates we have exactly. Code is here: https://codereview.chromium.org/2289943002 It measures total memory consumption as well as memory consumption by unique DERs only (i.e. ideal case). It also measures cache hit rate, i.e. hits / (misses + hits).

Opening TheVerge.com and clicking on the top article produces the following:
"DERCache: 310 KiB @ 244 instances (116 KiB @ 84 unique), hit rate: 78.7% (3931/1064)"

So, simply visiting two webpages (albeit heavy) wasted ~190 KiB in duplicate DER data.


Repeating same 22 URL test gives the following:
"DERCache: 1344 KiB @ 1027 instances (292 KiB @ 209 unique), hit rate: 86.5% (21999/3432)"

More that 1 MiB wasted. What is interesting here is that even 30 mins after closing the single tab DERCache was still holding 969 KiB @ 746 instances (290 KiB @ 208 unique).


So it looks like we can save some memory here by storing all DER encoded certificates in a global map. High hit rate should hide all additional costs associated with locking/adding/removing from the map.

 

Comment 1 by dskiba@chromium.org, Aug 29 2016

I implemented global DER cache in https://codereview.chromium.org/2289933002 With this change DER data memory consumption is "ideal", i.e. we store any given unique DER data only once.

To get a sense of performance regression I added a time histogram to GetDER() and performed the same "open theverge.com, click on first article" case several times. Results (Nexus 5X):

* Without the change GetDER() took 5.8 microsecond on average
* With the change GetDER() took 7.3 microsecond on average

I.e. GetDER() time regressed by 25%. But (hopefully) it's still small enough to be acceptable.

Comment 2 by dskiba@chromium.org, Aug 29 2016

Histograms for #1.
master-histograms.txt
19.3 KB View Download
patch-histograms.txt
19.5 KB View Download

Comment 3 by dskiba@chromium.org, Aug 29 2016

Cc: davidben@chromium.org
Components: Internals>Network>SSL
Cc: cbentzel@chromium.org
Note: I don't think we want to proceed with this CL as-is; we already have code that tries to de-duplicate certs in https://cs.chromium.org/chromium/src/net/cert/x509_certificate.cc?rcl=0&l=51

If you're finding duplicates in the DER cache, it suggests that there is a bug or issue in that code. I'd like to avoid duplicating caches if at all possible.

Comment 6 by dskiba@chromium.org, Aug 30 2016

Yes, I'm aware of that cache. In fact, most of GetDER() calls that resulted in duplication were made from IsSameOSCert() function called from there.

As I understand, things potentially get inserted in that cache when X509Certificate::CreateFromHandle() is called. One of the places where it's called is SSLClientSocketImpl::PeerCertificateChain::AsOSChain(). However, even though resulting X509Certificate holds references to cached certs, the chain it was created from (SSL_get_peer_cert_chain(ssl_)) is not freed and is still being held by ssl_ object. I.e. we called GetDER() on both certs (cached DER data on the new one from ssl_), then found and used cached cert, but left new one hang around retaining DER data.

Then I discovered that GetDER() is (potentially) called from other places too, like SSLClientSocketImpl::CertVerifyCallback(), and decided that it's safer to limit changes to GetDER().

I agree that one cache is better than two (although we already had per-cert DER cache), and there is definitely something going on here, especially considering that 30 mins after closing the only tab there were still 746 SSL certs hanging around (down from 1027, but still).

But just in case there is this fix that trades some performance for some memory.
In that case, this sounds like very much the wrong fix. Instead we just shouldn't use a caching GetDER call for certificates that haven't been deduplicated yet since those will never have GetDER called on them more than once.
"In fact, most of GetDER() calls that resulted in duplication were made from IsSameOSCert() function called from there."

Ah, and there is arguably our bug - which is we should have an optimization in https://cs.chromium.org/chromium/src/net/cert/x509_certificate.cc?rcl=0&l=138 to indicate we don't need to cache the DER.

The assumption here of X509CertificateCache is that the caller is going to give up their reference for the new reference, but as you note, this isn't true for the PeerCertificateChain case because the SSL* is still going to hold its copy. However, in that case, we'd only ever be calling GetDER() on the SSL-provided copy once.

An alternative solution (without the same performance penalty, and hopefully with significantly less memory overhead) would be to call the i2d_ functions directly in https://cs.chromium.org/chromium/src/net/socket/ssl_client_socket_impl.cc?rcl=0&l=453 , which would fix both the short-circuit *and* the general case (e.g. this could improve more than Android, and simplify the code)

try removing the #ifdef entirely, and AsOSChain() would become:

scoped_refptr<X509Certificate>
SSLClientSocketImpl::PeerCertificateChain::AsOSChain() const {
  // DER-encode the chain and convert to a platform certificate handle.
  std::vector<std::string> raw_chain;
  std::vector<base::StringPiece> temp_chain;
  for (size_t i = 0; i < sk_X509_num(openssl_chain_.get()); ++i) {
    X509* x = sk_X509_value(openssl_chain_.get(), i);
    int len = i2d_X509(x509, nullptr);
    if (len < 0)
      return nullptr;
    string cert;
    uint8_t* ptr = reinterpret_cast<uint8_t*>(base::WriteInto(&cert, len + 1));
    if (i2d_X509(x509, &ptr) < 0) {
      NOTREACHED();
      return nullptr;
    }
    raw_chain.emplace_back(cert);
    temp_chain.push_back(raw_chain.back());
  }
  return X509Certificate::CreateFromDERCertChain(temp_chain);
}


That should, AFAIK, be the only place where we're converting a shared X509* into an X509Certificate; every other conversion of an X509* into an X509Certificate (e.g. X509Certificate::CreateFromHandle) should be working on a singly-owned X509*, which means that any GetDER() overhead will be immediately freed.

Could you try measuring with that and seeing if you see the same overheads?

Comment 9 by dskiba@chromium.org, Aug 30 2016

But that means leaking internal detail (the fact that IsSameOSCert calls GetDER, which caches) to higher-level code?

Actually, a silly question: maybe we don't need X509CertificateCache in USE_OPENSSL_CERTS case at all?
Re #8: sure, I'll measure that solution tomorrow.
I said "without performance penalty", although I suppose the above code has the downside of forcing an additional (unused) certificate parse in the pass.

There are a number of ways we could solve this, without needing to introduce an (additional) lock; I suppose one question is: Is there any duplication you can track that *isn't* from that PeerCertificateChain call? I'm not aware of cases where we'd end up sharing an X509* with any OpenSSL-managed routines outside of that.
If we removed the X509CertificateCache, you'd likely see much more memory duplication in Chrome overall, because of the identity-assumption of X509Certificate::OSCertHandle.

That is, there's far greater usage of X509Certificate for ::OSCertHandle's *not* shared with an SSL* (such as from the HTTP cache) than there are with cases that are shared (the case you're seeing).
And the above code I mentioned was mostly to validate the idea that it's only PeerCertificateChain causing this; then we can figure out the right way, such as whether IsSameOSCert can/should take overloads to take in StringPeice data, which would be able to save an unnecessary certificate parse on all platforms. That's a complexity tradeoff that might be worth the (cross-platform) performance gains, if it looks promising.
Re #12, #13, makes sense, thanks.

So here are all places that called GetDER from my "22 URLs" test:

840.7 KiB net::x509_util::GetDER(x509_st*, base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >*)

681.5 KiB   net::X509Certificate::IsSameOSCert(x509_st*, x509_st*)
681.5 KiB     InsertOrUpdate
681.5 KiB       net::X509Certificate::X509Certificate(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&)
681.5 KiB         net::X509Certificate::CreateFromHandle(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&) 
678.5 KiB           net::SSLClientSocketImpl::PeerCertificateChain::AsOSChain() const
3.0 KiB             net::X509Certificate::CreateFromDERCertChain(std::__1::vector<base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&)

103.4 KiB   net::SSLClientSocketImpl::DoVerifyCert(int)
103.4 KiB     net::SSLClientSocketImpl::DoHandshakeLoop(int)
103.4 KiB       net::SSLClientSocketImpl::OnHandshakeIOComplete(int)

51.3 KiB    net::X509Certificate::GetDEREncoded(x509_st*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*)
41.0 KiB      net::CertVerifier::RequestParams::RequestParams(scoped_refptr<net::X509Certificate>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<scoped_refptr<net::X509Certificate>, std::__1::allocator<scoped_refptr<net::X509Certificate> > >)
40.0 KiB        net::SSLClientSocketImpl::DoVerifyCert(int)
1.0 KiB         net::ProofVerifierChromium::Job::DoVerifyCert(int)
10.4 KiB      net::ct::GetX509LogEntry(x509_st*, net::ct::LogEntry*)
10.4 KiB        net::MultiLogCTVerifier::Verify(net::X509Certificate*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, net::ct::CTVerifyResult*, net::BoundNetLog const&)

2.8 KiB     net::X509Certificate::CalculateCAFingerprint256(std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&)
2.8 KiB       net::X509Certificate::CalculateChainFingerprint256(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&)
2.8 KiB         content::CertStoreImpl::StoreCert(net::X509Certificate*, int)

1.6 KiB     net::X509Certificate::WriteOSCertHandleToPickle(x509_st*, base::Pickle*)
1.6 KiB       net::X509Certificate::Persist(base::Pickle*)
1.6 KiB         net::HttpResponseInfo::Persist(base::Pickle*, bool, bool) const

Yeah, the remaining calls are all operating on an X509Certificate-owned OSCertHandle (either incoming or outgoing) except for the ::AsOSChain(), so I would expect you should see the 'duplicate' data drop once we fix that place.

For context of why we added the DER cache, unless Davidben has new info with BoringSSL, 6 years ago it was showing up pretty high on CPU usage based on Chrome's usage pattern. On other platforms, whenever a certificate is parsed, the original form is kept alongside next to it. BoringSSL has the 'enc' member of the X509's cert_info ( https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/x509.h?rcl=0&l=243 -> https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/x509.h?rcl=1472469725&l=223 -> https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/asn1.h?rcl=1472469725&l=245 ) which is supposed to store the original data, but i2d_X509 still re-enocdes, rather than using cert->cert_info->enc

So it does mean we're storing *two* copies of the parsed cert for every X509 in an X509Certificate. Not ideal, and I think Davidben had thoughts on adding to the BoringSSL API to extract this data directly, but it's somewhat coupled to whether X509* is mutable (it is today, unfortunately; on other platforms, the ::OSCertHandle is not "technically" mutable)
Slight correction: i2d_X509 uses cert->cert_info->enc, but it re-encodes the outer envelope. It's possible that was enough to show up fairly high up there?

We're sadly not in a good place with in-memory certificate representations right now and need to set fire to OpenSSL's legacy X.509 code (it's some of the worst code in all of OpenSSL, which is really saying something), but that's a rather involved long-term project. I may try to figure out enough of that mess to have it store the Certificate rather than TBSCertificate representation, but the cost of changing that code is extremely high, so this is almost certainly a bad idea.
So with the change in #8 situation improved ~2x:

"TheVerge.com test":
DERCache: 173 KiB @ 120 instances (99 KiB @ 71 unique), hit rate: 76.3% (3325/1030)
(before: 310 KiB @ 244 instances)

"22 URLs test":
DERCache: 568 KiB @ 384 instances (250 KiB @ 176 unique), hit rate: 77.5% (17593/5121)
(before: 1344 KiB @ 1027 instances)

I took a trace and found that SSLClientSocketImpl::DoVerifyCert() calls GetDER() on server_cert_chain_.Get(0), i.e. we touch same certificate that previously was touched by SSLClientSocketImpl::PeerCertificateChain::AsOSChain(). So I simply commented that whole block (GetDER() + IsAllowedBadCert()), and voila:

DERCache: 223 KiB @ 154 instances (223 KiB @ 154 unique), hit rate: 78.3% (14153/3931)

on "22 URLs" test.

BTW, SSLConfig::IsAllowedBadCert() does nothing with passed DER when allowed_bad_certs is empty, which I assume is often? If so, there is an opportunity for an optimization (for both IsAllowedBadCert() overloads).
Owner: rsleevi@chromium.org
Status: Assigned (was: Untriaged)
I'll send a CL that does the needful and fixes up IsAllowedBadCert(), now that we no longer have to worry about //remoting's weird sandboxing (which is why it used DER to begin with)
Thanks for fixing this the proper way! The CL (crrev.com/2300533002) failed several times in a row - was it just bad luck, or are there any issues? Logs are expired, can't tell myself...
Oh gosh, I hadn't realized it hadn't landed. Thanks for poking. I think it was just bad luck (it looked like a date issue with one device in the swarm), but if it fails again, I'll dig in.
Project Member

Comment 21 by bugdroid1@chromium.org, Sep 13 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/74e99744fe3d88a5b4a759818fa813cd2d091bd7

commit 74e99744fe3d88a5b4a759818fa813cd2d091bd7
Author: rsleevi <rsleevi@chromium.org>
Date: Tue Sep 13 20:35:25 2016

Stop caching DER-encoded certificates unnecessarily

What was intended as a CPU performance short-circuit
in SSLClientSocket for Open/BoringSSL implementations
ended up introducing rather unfortunate and
significant memory overhead, thus proving yet again
that premature optimization is the root of all evil.

While cleaning the unnecessary DER-conversion/caching
up, a further optimization to reduce the overhead
of adding certificate errors was pointed out, allowing
us to avoid the overhead of storing another copy of the
DER-encoded certificate. As //remoting-in-NaCl can now
use BoringSSL directly, this unwinds
http://crrev.com/93153 and uses the cached certificate
directly.

BUG= 642082 
R=davidben@chromium.org, sergeyu@chromium.org

Review-Url: https://codereview.chromium.org/2300533002
Cr-Commit-Position: refs/heads/master@{#418352}

[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/http/http_stream_factory_impl_job.cc
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_client_socket_impl.cc
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_client_socket_unittest.cc
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_server_socket_unittest.cc
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/ssl/ssl_config.cc
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/ssl/ssl_config.h
[modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/remoting/protocol/ssl_hmac_channel_authenticator.cc

Hurray!
Status: Fixed (was: Assigned)
Labels: M-55
Project Member

Comment 25 by bugdroid1@chromium.org, Feb 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/50fee4d87b534d63aa9769d1c593c368a320edcd

commit 50fee4d87b534d63aa9769d1c593c368a320edcd
Author: davidben <davidben@chromium.org>
Date: Tue Feb 28 02:12:55 2017

Restore SSL_SESSION/X509Certificate X509* sharing

This effectively reverts https://codereview.chromium.org/2300533002
and replaces it with a smarter X.509 representation. We haven't
gotten rid of X509* completely yet, but we're far enough along there
that we can improve this.

There are three flavors of X509* that Chromium potentially keeps
around in memory:

1. Vanilla X509*. This does not make the full DER form easily
   accessible, but does cache the DER form of the
   TBSCertificate.

2. X509* + DERCache. This is (1) with an extra copy of the
   full DER form accessible. The DER form is accessible (needed
   by lots of things) but we use a bunch more memory.

3. X509* + CRYPTO_BUFFER. This is a smarter version of (2). The
   full DER form is stored in a CRYPTO_BUFFER but rather than
   have it waste memory, we alias the cached TBSCertificate into
   it.

(3) was added early on in switching from X509 to CRYPTO_BUFFER.
When https://codereview.chromium.org/2300533002 was done, (3) did
not exist, so we had this split where X509s hanging off
SSL_SESSION did not need DERCache but net::X509Certificate did.
Sharing the X509* was a nice memory optimization in one direction
(fewer X509s in memory---X509s are really *really* inefficient,
which is one of the motivations in removing them), but at the
cost of stapling DERCache to more things.

Whenever the SSL_SESSION's reference outlives the
net::X509Certificate's reference, 2300533002 is a win. When both
are alive in memory, it is a loss.

Now, every SSL_SESSION-owned X509 is of type (3) anyway, so we can
be smarter. Undo the X509*-side regression and instead just don't
staple DERCache onto X509*s of type (3). They already have a free
DERCache on them. This achieves 2300533002's goals without the
X509* tradeoff.

BUG= 671420 , 642082 

Review-Url: https://codereview.chromium.org/2694903006
Cr-Commit-Position: refs/heads/master@{#453455}

[modify] https://crrev.com/50fee4d87b534d63aa9769d1c593c368a320edcd/net/cert/x509_util_openssl.cc
[modify] https://crrev.com/50fee4d87b534d63aa9769d1c593c368a320edcd/net/socket/ssl_client_socket_impl.cc

Sign in to add a comment