Optimize DER cache for memory |
||||||
Issue descriptionI'm working on reducing amount of duplicate memory in Clank, and noticed that GetDER is responsible for 840 KiB of duplicate data (i.e. at least half of that is duplicate) in a test where 22 URLs are opened in 10 sec intervals in a single tab. In more usual scenarios like "open theverge.com, click on an article" GetDER is responsible for 96 KiB. Duplicates are found by periodically hashing all allocated regions, see here: https://groups.google.com/a/chromium.org/d/msg/project-trim/EiKXwnbTvbo/0z-Y5Kg1CgAJ So I instrumented GetDER and DERCache to understand how much duplicates we have exactly. Code is here: https://codereview.chromium.org/2289943002 It measures total memory consumption as well as memory consumption by unique DERs only (i.e. ideal case). It also measures cache hit rate, i.e. hits / (misses + hits). Opening TheVerge.com and clicking on the top article produces the following: "DERCache: 310 KiB @ 244 instances (116 KiB @ 84 unique), hit rate: 78.7% (3931/1064)" So, simply visiting two webpages (albeit heavy) wasted ~190 KiB in duplicate DER data. Repeating same 22 URL test gives the following: "DERCache: 1344 KiB @ 1027 instances (292 KiB @ 209 unique), hit rate: 86.5% (21999/3432)" More that 1 MiB wasted. What is interesting here is that even 30 mins after closing the single tab DERCache was still holding 969 KiB @ 746 instances (290 KiB @ 208 unique). So it looks like we can save some memory here by storing all DER encoded certificates in a global map. High hit rate should hide all additional costs associated with locking/adding/removing from the map.
,
Aug 29 2016
Histograms for #1.
,
Aug 29 2016
,
Aug 29 2016
,
Aug 29 2016
Note: I don't think we want to proceed with this CL as-is; we already have code that tries to de-duplicate certs in https://cs.chromium.org/chromium/src/net/cert/x509_certificate.cc?rcl=0&l=51 If you're finding duplicates in the DER cache, it suggests that there is a bug or issue in that code. I'd like to avoid duplicating caches if at all possible.
,
Aug 30 2016
Yes, I'm aware of that cache. In fact, most of GetDER() calls that resulted in duplication were made from IsSameOSCert() function called from there. As I understand, things potentially get inserted in that cache when X509Certificate::CreateFromHandle() is called. One of the places where it's called is SSLClientSocketImpl::PeerCertificateChain::AsOSChain(). However, even though resulting X509Certificate holds references to cached certs, the chain it was created from (SSL_get_peer_cert_chain(ssl_)) is not freed and is still being held by ssl_ object. I.e. we called GetDER() on both certs (cached DER data on the new one from ssl_), then found and used cached cert, but left new one hang around retaining DER data. Then I discovered that GetDER() is (potentially) called from other places too, like SSLClientSocketImpl::CertVerifyCallback(), and decided that it's safer to limit changes to GetDER(). I agree that one cache is better than two (although we already had per-cert DER cache), and there is definitely something going on here, especially considering that 30 mins after closing the only tab there were still 746 SSL certs hanging around (down from 1027, but still). But just in case there is this fix that trades some performance for some memory.
,
Aug 30 2016
In that case, this sounds like very much the wrong fix. Instead we just shouldn't use a caching GetDER call for certificates that haven't been deduplicated yet since those will never have GetDER called on them more than once.
,
Aug 30 2016
"In fact, most of GetDER() calls that resulted in duplication were made from IsSameOSCert() function called from there." Ah, and there is arguably our bug - which is we should have an optimization in https://cs.chromium.org/chromium/src/net/cert/x509_certificate.cc?rcl=0&l=138 to indicate we don't need to cache the DER. The assumption here of X509CertificateCache is that the caller is going to give up their reference for the new reference, but as you note, this isn't true for the PeerCertificateChain case because the SSL* is still going to hold its copy. However, in that case, we'd only ever be calling GetDER() on the SSL-provided copy once. An alternative solution (without the same performance penalty, and hopefully with significantly less memory overhead) would be to call the i2d_ functions directly in https://cs.chromium.org/chromium/src/net/socket/ssl_client_socket_impl.cc?rcl=0&l=453 , which would fix both the short-circuit *and* the general case (e.g. this could improve more than Android, and simplify the code) try removing the #ifdef entirely, and AsOSChain() would become: scoped_refptr<X509Certificate> SSLClientSocketImpl::PeerCertificateChain::AsOSChain() const { // DER-encode the chain and convert to a platform certificate handle. std::vector<std::string> raw_chain; std::vector<base::StringPiece> temp_chain; for (size_t i = 0; i < sk_X509_num(openssl_chain_.get()); ++i) { X509* x = sk_X509_value(openssl_chain_.get(), i); int len = i2d_X509(x509, nullptr); if (len < 0) return nullptr; string cert; uint8_t* ptr = reinterpret_cast<uint8_t*>(base::WriteInto(&cert, len + 1)); if (i2d_X509(x509, &ptr) < 0) { NOTREACHED(); return nullptr; } raw_chain.emplace_back(cert); temp_chain.push_back(raw_chain.back()); } return X509Certificate::CreateFromDERCertChain(temp_chain); } That should, AFAIK, be the only place where we're converting a shared X509* into an X509Certificate; every other conversion of an X509* into an X509Certificate (e.g. X509Certificate::CreateFromHandle) should be working on a singly-owned X509*, which means that any GetDER() overhead will be immediately freed. Could you try measuring with that and seeing if you see the same overheads?
,
Aug 30 2016
But that means leaking internal detail (the fact that IsSameOSCert calls GetDER, which caches) to higher-level code? Actually, a silly question: maybe we don't need X509CertificateCache in USE_OPENSSL_CERTS case at all?
,
Aug 30 2016
Re #8: sure, I'll measure that solution tomorrow.
,
Aug 30 2016
I said "without performance penalty", although I suppose the above code has the downside of forcing an additional (unused) certificate parse in the pass. There are a number of ways we could solve this, without needing to introduce an (additional) lock; I suppose one question is: Is there any duplication you can track that *isn't* from that PeerCertificateChain call? I'm not aware of cases where we'd end up sharing an X509* with any OpenSSL-managed routines outside of that.
,
Aug 30 2016
If we removed the X509CertificateCache, you'd likely see much more memory duplication in Chrome overall, because of the identity-assumption of X509Certificate::OSCertHandle. That is, there's far greater usage of X509Certificate for ::OSCertHandle's *not* shared with an SSL* (such as from the HTTP cache) than there are with cases that are shared (the case you're seeing).
,
Aug 30 2016
And the above code I mentioned was mostly to validate the idea that it's only PeerCertificateChain causing this; then we can figure out the right way, such as whether IsSameOSCert can/should take overloads to take in StringPeice data, which would be able to save an unnecessary certificate parse on all platforms. That's a complexity tradeoff that might be worth the (cross-platform) performance gains, if it looks promising.
,
Aug 30 2016
Re #12, #13, makes sense, thanks. So here are all places that called GetDER from my "22 URLs" test: 840.7 KiB net::x509_util::GetDER(x509_st*, base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >*) 681.5 KiB net::X509Certificate::IsSameOSCert(x509_st*, x509_st*) 681.5 KiB InsertOrUpdate 681.5 KiB net::X509Certificate::X509Certificate(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&) 681.5 KiB net::X509Certificate::CreateFromHandle(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&) 678.5 KiB net::SSLClientSocketImpl::PeerCertificateChain::AsOSChain() const 3.0 KiB net::X509Certificate::CreateFromDERCertChain(std::__1::vector<base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<base::BasicStringPiece<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&) 103.4 KiB net::SSLClientSocketImpl::DoVerifyCert(int) 103.4 KiB net::SSLClientSocketImpl::DoHandshakeLoop(int) 103.4 KiB net::SSLClientSocketImpl::OnHandshakeIOComplete(int) 51.3 KiB net::X509Certificate::GetDEREncoded(x509_st*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) 41.0 KiB net::CertVerifier::RequestParams::RequestParams(scoped_refptr<net::X509Certificate>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<scoped_refptr<net::X509Certificate>, std::__1::allocator<scoped_refptr<net::X509Certificate> > >) 40.0 KiB net::SSLClientSocketImpl::DoVerifyCert(int) 1.0 KiB net::ProofVerifierChromium::Job::DoVerifyCert(int) 10.4 KiB net::ct::GetX509LogEntry(x509_st*, net::ct::LogEntry*) 10.4 KiB net::MultiLogCTVerifier::Verify(net::X509Certificate*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, net::ct::CTVerifyResult*, net::BoundNetLog const&) 2.8 KiB net::X509Certificate::CalculateCAFingerprint256(std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&) 2.8 KiB net::X509Certificate::CalculateChainFingerprint256(x509_st*, std::__1::vector<x509_st*, std::__1::allocator<x509_st*> > const&) 2.8 KiB content::CertStoreImpl::StoreCert(net::X509Certificate*, int) 1.6 KiB net::X509Certificate::WriteOSCertHandleToPickle(x509_st*, base::Pickle*) 1.6 KiB net::X509Certificate::Persist(base::Pickle*) 1.6 KiB net::HttpResponseInfo::Persist(base::Pickle*, bool, bool) const
,
Aug 30 2016
Yeah, the remaining calls are all operating on an X509Certificate-owned OSCertHandle (either incoming or outgoing) except for the ::AsOSChain(), so I would expect you should see the 'duplicate' data drop once we fix that place. For context of why we added the DER cache, unless Davidben has new info with BoringSSL, 6 years ago it was showing up pretty high on CPU usage based on Chrome's usage pattern. On other platforms, whenever a certificate is parsed, the original form is kept alongside next to it. BoringSSL has the 'enc' member of the X509's cert_info ( https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/x509.h?rcl=0&l=243 -> https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/x509.h?rcl=1472469725&l=223 -> https://cs.chromium.org/chromium/src/third_party/boringssl/src/include/openssl/asn1.h?rcl=1472469725&l=245 ) which is supposed to store the original data, but i2d_X509 still re-enocdes, rather than using cert->cert_info->enc So it does mean we're storing *two* copies of the parsed cert for every X509 in an X509Certificate. Not ideal, and I think Davidben had thoughts on adding to the BoringSSL API to extract this data directly, but it's somewhat coupled to whether X509* is mutable (it is today, unfortunately; on other platforms, the ::OSCertHandle is not "technically" mutable)
,
Aug 30 2016
Slight correction: i2d_X509 uses cert->cert_info->enc, but it re-encodes the outer envelope. It's possible that was enough to show up fairly high up there? We're sadly not in a good place with in-memory certificate representations right now and need to set fire to OpenSSL's legacy X.509 code (it's some of the worst code in all of OpenSSL, which is really saying something), but that's a rather involved long-term project. I may try to figure out enough of that mess to have it store the Certificate rather than TBSCertificate representation, but the cost of changing that code is extremely high, so this is almost certainly a bad idea.
,
Aug 30 2016
So with the change in #8 situation improved ~2x: "TheVerge.com test": DERCache: 173 KiB @ 120 instances (99 KiB @ 71 unique), hit rate: 76.3% (3325/1030) (before: 310 KiB @ 244 instances) "22 URLs test": DERCache: 568 KiB @ 384 instances (250 KiB @ 176 unique), hit rate: 77.5% (17593/5121) (before: 1344 KiB @ 1027 instances) I took a trace and found that SSLClientSocketImpl::DoVerifyCert() calls GetDER() on server_cert_chain_.Get(0), i.e. we touch same certificate that previously was touched by SSLClientSocketImpl::PeerCertificateChain::AsOSChain(). So I simply commented that whole block (GetDER() + IsAllowedBadCert()), and voila: DERCache: 223 KiB @ 154 instances (223 KiB @ 154 unique), hit rate: 78.3% (14153/3931) on "22 URLs" test. BTW, SSLConfig::IsAllowedBadCert() does nothing with passed DER when allowed_bad_certs is empty, which I assume is often? If so, there is an opportunity for an optimization (for both IsAllowedBadCert() overloads).
,
Aug 31 2016
I'll send a CL that does the needful and fixes up IsAllowedBadCert(), now that we no longer have to worry about //remoting's weird sandboxing (which is why it used DER to begin with)
,
Sep 9 2016
Thanks for fixing this the proper way! The CL (crrev.com/2300533002) failed several times in a row - was it just bad luck, or are there any issues? Logs are expired, can't tell myself...
,
Sep 9 2016
Oh gosh, I hadn't realized it hadn't landed. Thanks for poking. I think it was just bad luck (it looked like a date issue with one device in the swarm), but if it fails again, I'll dig in.
,
Sep 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/74e99744fe3d88a5b4a759818fa813cd2d091bd7 commit 74e99744fe3d88a5b4a759818fa813cd2d091bd7 Author: rsleevi <rsleevi@chromium.org> Date: Tue Sep 13 20:35:25 2016 Stop caching DER-encoded certificates unnecessarily What was intended as a CPU performance short-circuit in SSLClientSocket for Open/BoringSSL implementations ended up introducing rather unfortunate and significant memory overhead, thus proving yet again that premature optimization is the root of all evil. While cleaning the unnecessary DER-conversion/caching up, a further optimization to reduce the overhead of adding certificate errors was pointed out, allowing us to avoid the overhead of storing another copy of the DER-encoded certificate. As //remoting-in-NaCl can now use BoringSSL directly, this unwinds http://crrev.com/93153 and uses the cached certificate directly. BUG= 642082 R=davidben@chromium.org, sergeyu@chromium.org Review-Url: https://codereview.chromium.org/2300533002 Cr-Commit-Position: refs/heads/master@{#418352} [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/http/http_stream_factory_impl_job.cc [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_client_socket_impl.cc [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_client_socket_unittest.cc [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/socket/ssl_server_socket_unittest.cc [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/ssl/ssl_config.cc [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/net/ssl/ssl_config.h [modify] https://crrev.com/74e99744fe3d88a5b4a759818fa813cd2d091bd7/remoting/protocol/ssl_hmac_channel_authenticator.cc
,
Sep 13 2016
Hurray!
,
Sep 19 2016
,
Sep 20 2016
,
Feb 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/50fee4d87b534d63aa9769d1c593c368a320edcd commit 50fee4d87b534d63aa9769d1c593c368a320edcd Author: davidben <davidben@chromium.org> Date: Tue Feb 28 02:12:55 2017 Restore SSL_SESSION/X509Certificate X509* sharing This effectively reverts https://codereview.chromium.org/2300533002 and replaces it with a smarter X.509 representation. We haven't gotten rid of X509* completely yet, but we're far enough along there that we can improve this. There are three flavors of X509* that Chromium potentially keeps around in memory: 1. Vanilla X509*. This does not make the full DER form easily accessible, but does cache the DER form of the TBSCertificate. 2. X509* + DERCache. This is (1) with an extra copy of the full DER form accessible. The DER form is accessible (needed by lots of things) but we use a bunch more memory. 3. X509* + CRYPTO_BUFFER. This is a smarter version of (2). The full DER form is stored in a CRYPTO_BUFFER but rather than have it waste memory, we alias the cached TBSCertificate into it. (3) was added early on in switching from X509 to CRYPTO_BUFFER. When https://codereview.chromium.org/2300533002 was done, (3) did not exist, so we had this split where X509s hanging off SSL_SESSION did not need DERCache but net::X509Certificate did. Sharing the X509* was a nice memory optimization in one direction (fewer X509s in memory---X509s are really *really* inefficient, which is one of the motivations in removing them), but at the cost of stapling DERCache to more things. Whenever the SSL_SESSION's reference outlives the net::X509Certificate's reference, 2300533002 is a win. When both are alive in memory, it is a loss. Now, every SSL_SESSION-owned X509 is of type (3) anyway, so we can be smarter. Undo the X509*-side regression and instead just don't staple DERCache onto X509*s of type (3). They already have a free DERCache on them. This achieves 2300533002's goals without the X509* tradeoff. BUG= 671420 , 642082 Review-Url: https://codereview.chromium.org/2694903006 Cr-Commit-Position: refs/heads/master@{#453455} [modify] https://crrev.com/50fee4d87b534d63aa9769d1c593c368a320edcd/net/cert/x509_util_openssl.cc [modify] https://crrev.com/50fee4d87b534d63aa9769d1c593c368a320edcd/net/socket/ssl_client_socket_impl.cc |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dskiba@chromium.org
, Aug 29 2016