New issue
Advanced search Search tips

Issue 792722 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Remove net::ParsedCertificate.

Project Member Reported by eroman@chromium.org, Dec 6 2017

Issue description

Currently, net::ParsedCertificate (net/cert/internal/parsed_certificate) is a threadsafe, immutable, reference-counted, wrapper for certificate bytes, and parsed fields of the certificate.

It is being used as a handle to a parsed certificate throughout the //net/cert/internal/* code. This approach mostly isolates parsing of certificates and error handling to the caller.

However, this abstraction is not a good fit for today's code.

Especially with the move to representing certificates by CRYPTO_BUFFER. There are a number of concerns with the ParsedCertificate approach that I can go into separately if more motivation is warranted.

My concrete proposal is to delete ParsedCertificate, and replace the code in //net/cert/internal (and throughout) to operate exclusively on a DER-based representation: CRYPTO_BUFFER, der::Input, or base::StringPiece. The consuming code would then lazily parse bits of the certificate it is interested in during the course of path building/verification.

The immediate benefits of this is to unify the different interfaces to certificate parsing we have grown. Namely, asn1_util.h, parsed_certificate.h, parse_certificate.h, and feature-specific files.

The end result after this refactor would be to unify on a single API akin to asn1_util.h, that extracts data from a given set of DER bytes (either directly aliased bytes in the case of TLV, or caller-owned datastructures in the case where we want to do normalizations).

... I have looked at this in the past and believe it would be a good fit, but postponed because of the non-trivial effort in refactoring all the code. But we should probably do this sooner rather than later.

Some other (less desirable) consequences of this refactor are:

 * Would change how we create zero-copy certificates (i.e. ParsedCertificate::CreateWithoutCopyingUnsafe). Right now Cast uses this to inflate a certificate from static data without making an extra copy. I am not familiar enough with CRYPTO_BUFFER, but my impression is that it owns a copy of the data and would not allow aliasing non-owned data. So while CRYPTO_BUFFER is an attractive means of sharing ownership of the DER bytes, it may not work as a direct container for this use-case. This isn't a deal-breaker, as there are other ways to solve it.

 * Will have performance implications from extra re-parsing (will need to measure). My gut feeling is this will be negligible, Looking at the code spanning CertVerifier and CertVerifyProc, we already do lots of extra re-parsing steps especially with the move towards USE_BYTE_CERTS. So it is already the case that hot paths for parsing need to be optimized, and parsing of fields grouped into the appropriate layers to avoid duplicating the effort. Lastly, upfront eager parsing is worse for performance in other cases where it does more work than necessary. Which brings us to:

 * May weaken strictness of certificate parsing. With the up-front parsing approach, some care was taken to ensure that the entirety of the certificate was checked during parsing. Namely, that unused fields were still valid according to the ASN.1, and that extra data did not appear in places that lacked extension points. With the lazy-parsing approach where clients grab only what they want, we have the benefit of not needing to parse all the data, but as a result will not end up necessarily checking all parts. We are likely to become more permissive on parsing failures provided the field isn't needed during verification.

 * Will change some error handling for parsing failures. Currently certificate parsing failures are largely isolated from CertPathBuilder and its dependencies in the net/internal layer cake. This is nice in some ways, since the client can decide what level of strictness to parse certificates with (i.e. ParseCertificateOptions), and deal with parsing errors while converting DER into ParsedCertificates to drive pathbuilding. On the other hand, handling parsing errors prior to verification is incomplete since some parsing invariably happens in the course of path building (AIA, intermediates fetched from system trust store, delegated OCSP) anyway. With this change, code in say verify_certificate_chain that was previously oblivious to parsing will need to lazily extract, cache, test, and re-throw as appropriate (which is a more tedious code-pattern).

Some alternatives considered and rejected for ParsedCertificate are:

 * Instead of eager parsing into an immutable object, do lazy parsing and cache what is useful for performance. This implies some form of locking, and all the issues inherent to that. This approach also doesn't have a good story for build up of bloat, and the synchronization could be worse for performance than just re-parsing in many cases.

 * Make ParsedCertificate non-reference counted, thread hostile, and owned by the caller. This effectively turns it into a "LazyParsedCertificate" which can be passed between layers of an individual certificate verification to avoid re-parsing common structures. There may be opportunity for this approach on a limited scale (X509Certificate for instance serves as a cache for the serial, validity, subject, and issuer), but as a general-purpose structure it doesn't address any of the code complexity concerns, and the performance benefits are also limited as this parse cache is not shared.
 
We discussed the end-state API and some of the issues with it, relative to the use cases, early on in the design and implementation. The consequences you mention were things we were intentionally trying to avoid - and the goal was to do sufficient up-front parsing for the purposes of path building, and also avoid unnecessary re-parsing.

Let's sync up to understand more about the code complexity concerns, since the different approaches being discussed (immutable and thread safe, owned and thread hostile, lazy with locks) are all APIs we can see deployed in the real world, and all have their own set of tradeoffs that affect the code consuming them.

Sign in to add a comment