New issue
Advanced search Search tips
Starred by 11 users

Issue metadata

Status: Assigned
Owner:
Cc:
OS: All
Pri: 3
Type: Bug


Previous locations:
chromium:567501


Sign in to add a comment
link

Issue 38: Make BoringSSL resilient to failing to ERR_clear_error.

Reported by davidben@chromium.org, Dec 8 2015 Project Member

Issue description

Realized we never put a bug on file here. Among problems with OpenSSL's... unique... error system is that it's very easy to get it into a messed up state. Like a lot of other platforms, OpenSSL stashes errors out-of-band (a la errno and GetLastError).

One rarely needs to clear errno. If some syscall doesn't fail (signalled in-band), sampling errno will return unhelpful things, but so long as the consumer only samples errno immediately after a failing syscall, it doesn't matter what errno was before any given syscall.

OpenSSL's functions, however, are sensitive to the error state. Errors form a backtrace-like queue, with ERR_put_error adding to the backtrace, rather than clearing it, so existing error state will get munged into the current one. Moreover, SSL_get_error is very sensitive to the error state. As things are right now, the consumer needs to consume errors after every failed function but code often forgets to do this.

Options:

1.  ERR_clear_error on entry to every function. This would work and we're using thread-locals now, but it would be a huge hassle.

2. Audit all OPENSSL_PUT_ERROR calls and separate into OPENSSL_PUT_ERROR vs OPENSSL_ADD_ERROR. The former starts a new trace and implicitly calls ERR_clear_error. The latter appends and will assert that the queue is empty. This will take some care because code occasionally does:

  if (!SomethingThatUsesErr() ||
      !SomethingThatDoesNot()) {
    OPENSSL_PUT_ERROR(FOO, FOO_R_SOMETHING_FAILED);
    return 0;
  }

3. Give up on the backtrace. Go through all the code and remove all non-leaf OPENSSL_PUT_ERROR calls. Err acts like errno.

2 and 3 require more-or-less the same audit, so probably start with it. But getting rid of the queue altogether is somewhat tempting. (Adam?)

Regardless of which we go with, we'll also need to make SSL_get_error less sensitive to the error state (I think this should be possible, but I'll need to check on custom BIOs), or, short-term, we ERR_clear_error at the entry point to SSL layer functions which pair with SSL_get_error. (I think this is basically already the case, so finishing the job seems sane.)
 

Comment 1 by agl@chromium.org, Dec 13 2015

I prefer (2) to (3), I think. Although _ADD_ERROR isn't great. OPENSSL_START_ERROR?

Comment 2 by davidben@chromium.org, Dec 15 2015

Project Member
Some observations from toying with this slightly.

  if (!FOO_something_that_uses_err()) {
    OPENSSL_ADD_TO_ERROR(FOO, FOO_R_SOMETHING_FAILED);
    return 0;
  }

requires that FOO_something_that_uses_err always push to the error queue on failure. So we need to fix everything that is inconsistent. (I'm not terribly unhappy about this, but we'll have to be careful here.)

If we can't guarantee this because it's some external callback or too inconvenient, we have to instead do:

  ERR_clear_error();
  if (FOO_something_that_might_use_err()) {
    OPENSSL_PUT_ERROR(FOO, FOO_R_SOMETHING_FAILED);
    return 0;
  }

where PUT has the current semantics of starting a new error if empty and appending if not.

Also, we sometimes write code like:

  foo->a = BN_new();
  foo->b = BN_new();
  if (foo->a == NULL || foo->b == NULL) {
    return 0;
  }

This will produce a mangled error queue if both BN_new calls fail. These proposals would actually fix this. The second OPENSSL_START_ERROR would clobber the first chain.

Comment 3 by bugdroid1@chromium.org, Mar 14 2016

The following revision refers to this bug:
  https://boringssl.googlesource.com/boringssl.git/+/15c1488b6177b269a311814b63e670df534549e3

commit 15c1488b6177b269a311814b63e670df534549e3
Author: David Benjamin <davidben@google.com>
Date: Mon Mar 14 18:25:46 2016

Clear the error queue on entry to core SSL operations.

OpenSSL historically made some poor API decisions. Rather than returning a
status enum in SSL_read, etc., these functions must be paired with
SSL_get_error which determines the cause of the last error's failure. This
requires SSL_read communicate with SSL_get_error with some stateful flag,
rwstate.

Further, probably as workarounds for bugs elsewhere, SSL_get_error does not
trust rwstate. Among other quirks, if the error queue is non-empty,
SSL_get_error overrides rwstate and returns a value based on that. This
requires that SSL_read, etc., be called with an empty error queue. (Or we hit
one of the spurious ERR_clear_error calls in the handshake state machine,
likely added as further self-workarounds.)

Since requiring callers consistently clear the error queue everywhere is
unreasonable (crbug.com/567501), clear ERR_clear_error *once* at the entry
point. Until/unless[*] we make SSL_get_error sane, this is the most reasonable
way to get to the point that clearing the error queue on error is optional.

With those in place, the calls in the handshake state machine are no longer
needed. (I suspect all the ERR_clear_system_error calls can also go, but I'll
investigate and think about that separately.)

[*] I'm not even sure it's possible anymore, thanks to the possibility of
BIO_write pushing to the error queue.

BUG=567501,593963

Change-Id: I564ace199e5a4a74b2554ad3335e99cd17120741
Reviewed-on: https://boringssl-review.googlesource.com/7455
Reviewed-by: Steven Valdez <svaldez@google.com>
Reviewed-by: David Benjamin <davidben@google.com>

[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/d1_clnt.c
[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/d1_lib.c
[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/d1_srvr.c
[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/s3_clnt.c
[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/s3_srvr.c
[modify] https://crrev.com/15c1488b6177b269a311814b63e670df534549e3/ssl/ssl_lib.c

Comment 4 by davidben@chromium.org, Apr 11 2016

Project Member
(Bulk edit to move to BoringSSL project.)

Comment 5 by davidben@chromium.org, Apr 11 2016

Project Member
Project: boringssl
Moved issue chromium:567501 to now be issue boringssl:38.

Comment 6 Deleted

Sign in to add a comment