New issue
Advanced search Search tips

Issue 670885 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug

Blocked on:
issue 670934



Sign in to add a comment

base/numerics still contains a bunch of undefined behavior

Project Member Reported by thakis@chromium.org, Dec 2 2016

Issue description

To verify my fix (and sufficient test coverage) for  bug 669642 , I built base_unittests with ubsan in a mode that finds undefined float conversions. Once https://codereview.chromium.org/2550593004/ is through the cq, just setting `is_ubsan=true is_ubsan_no_recover=true` will be enough for that.

However, there are still several problems in the code just with the existing tests.

To repro, add third_party/llvm-build/Release+Asserts/bin to your PATH (else your stacks won't be symbolized) and run `out/gn/base_unittests --gtest_filter=SafeNumerics.*`

This reports:

[ RUN      ] SafeNumerics.FloatOperations
../../base/numerics/safe_math_impl.h:868:31: runtime error: value 1.79769e+308 is outside the range of representable values of type 'float'
    #0 0x7218f2 in CheckedNumericState<double> base/numerics/safe_math_impl.h:868:31
    #1 0x7218f2 in CheckedNumeric<double> base/numerics/safe_math.h:86
    #2 0x7218f2 in Cast<float> base/numerics/safe_math.h:169
    #3 0x7218f2 in Test base/numerics/safe_numerics_unittest.cc:470
    #4 0x7218f2 in SafeNumerics_FloatOperations_Test::TestBody() base/numerics/safe_numerics_unittest.cc:669
    #5 0xb5e064 in testing::Test::Run() testing/gtest/src/gtest.cc:2474:5
    #6 0xb5ebcb in testing::TestInfo::Run() testing/gtest/src/gtest.cc:2656:11
    #7 0xb5f3ce in testing::TestCase::Run() testing/gtest/src/gtest.cc:2774:28
    #8 0xb651db in testing::internal::UnitTestImpl::RunAllTests() testing/gtest/src/gtest.cc:4647:43
    #9 0xb64ce5 in testing::UnitTest::Run() testing/gtest/src/gtest.cc:4255:10
    #10 0xb306f9 in RUN_ALL_TESTS testing/gtest/include/gtest/gtest.h:2237:46
    #11 0xb306f9 in base::TestSuite::Run() base/test/test_suite.cc:271
    #12 0xb3a364 in Run base/callback.h:85:12
    #13 0xb3a364 in base::(anonymous namespace)::LaunchUnitTestsInternal(base::Callback<int (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&, int, int, bool, base::Callback<void (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) base/test/launcher/unit_test_launcher.cc:211
    #14 0xb3a1e7 in base::LaunchUnitTests(int, char**, base::Callback<int (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) base/test/launcher/unit_test_launcher.cc:453:10
    #15 0xb25cf7 in main base/test/run_all_base_unittests.cc:22:10
    #16 0x7f311c700f44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #17 0x46d5e8 in _start (/usr/local/google/home/thakis/src/chrome/src/out/gn/base_unittests+0x46d5e8)


This is in:

// specified class. By default, it will return 0.
template <typename Dst,
          class NaNHandler = SaturatedCastNaNBehaviorReturnZero,
          typename Src>
constexpr Dst saturated_cast(Src value) {
  return std::numeric_limits<Dst>::is_iec559
             ? static_cast<Dst>(value)  // Floating point optimization.  /// <-- HERE
             : internal::saturated_cast_impl<Dst, NaNHandler>(
                   value, internal::DstRangeRelationToSrcRange<Dst>(value));
}

Alright, an incorrect optimization. If I remove that optimization and always go down the slow path, then these two tests in safe_numerics_unittest.cc start failing (not due to undefined behavior, just because the result is slightly different):

  EXPECT_EQ(double_infinity, saturated_cast<float>(double_large));
  EXPECT_EQ(-double_infinity, saturated_cast<float>(-double_large));

This might be a bug with the test. (Also, safe_numerics_unittest.cc has incorrect order of EXPECT_EQ arguments almost throughout. It's EXPECT_EQ(expected, actual), if that's not used then test failure printouts are super confusing. Someone should swap most of the EXPECT_EQ args in that file.)

I tried to investigate and saw this further up:

  double double_infinity = numeric_limits<float>::infinity();

I thought to myself "this looks weird" and changed it to

  double double_infinity = numeric_limits<double>::infinity();

Now ubsan finds undefined behavior again:

../../base/numerics/safe_math_impl.h:868:31: runtime error: value 1.79769e+308 is outside the range of representable values of type 'float'
    #0 0x7218f2 in CheckedNumericState<double> base/numerics/safe_math_impl.h:868:31
    #1 0x7218f2 in CheckedNumeric<double> base/numerics/safe_math.h:86
    #2 0x7218f2 in Cast<float> base/numerics/safe_math.h:169
    #3 0x7218f2 in Test base/numerics/safe_numerics_unittest.cc:470
    #4 0x7218f2 in SafeNumerics_FloatOperations_Test::TestBody() base/numerics/safe_numerics_unittest.cc:669
    #5 0xb5ea94 in testing::Test::Run() testing/gtest/src/gtest.cc:2474:5
    #6 0xb5f5fb in testing::TestInfo::Run() testing/gtest/src/gtest.cc:2656:11
    #7 0xb5fdfe in testing::TestCase::Run() testing/gtest/src/gtest.cc:2774:28
    #8 0xb65c0b in testing::internal::UnitTestImpl::RunAllTests() testing/gtest/src/gtest.cc:4647:43
    #9 0xb65715 in testing::UnitTest::Run() testing/gtest/src/gtest.cc:4255:10
    #10 0xb31129 in RUN_ALL_TESTS testing/gtest/include/gtest/gtest.h:2237:46
    #11 0xb31129 in base::TestSuite::Run() base/test/test_suite.cc:271
    #12 0xb3ad94 in Run base/callback.h:85:12
    #13 0xb3ad94 in base::(anonymous namespace)::LaunchUnitTestsInternal(base::Callback<int (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&, int, int, bool, base::Callback<void (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) base/test/launcher/unit_test_launcher.cc:211
    #14 0xb3ac17 in base::LaunchUnitTests(int, char**, base::Callback<int (), (base::internal::CopyMode)1, (base::internal::RepeatMode)1> const&) base/test/launcher/unit_test_launcher.cc:453:10
    #15 0xb26727 in main base/test/run_all_base_unittests.cc:22:10
    #16 0x7f251974ff44 in __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:287
    #17 0x46d5e8 in _start (/usr/local/google/home/thakis/src/chrome/src/out/gn/base_unittests+0x46d5e8)

That's here:


  template <typename Src>
  constexpr explicit CheckedNumericState(
      Src value,
      typename std::enable_if<std::numeric_limits<Src>::is_specialized,
                              int>::type = 0)
      : value_(static_cast<T>(value)) {}


So this needs testing before casting as well. At this point I ran out of yak-shaving budget and filed this bug instead.


Ideally we'd have a fuzzer that runs CheckedNumeric ops with fuzzed numbers under ubsan, to get some confidence that this code works. And then our compiler friends can no longer make fun of the irony of having indefined behavior in our safe_math code :-)
 
Blockedon: 670934
> Ideally we'd have a fuzzer that runs CheckedNumeric ops with fuzzed numbers under ubsan, to get some confidence that this code works. And then our compiler friends can no longer make fun of the irony of having undefined behavior in our safe_math code :-)

There was a conscious tradeoff here that I should explain. Basically, I lump undefined behavior into two classes:

Strong - Execution logic depends in some way on the result of an operation with undefined behavior (e.g. triggering signed overflow and branching on or otherwise using the result).
Weak - Undefined behavior is triggered, but no program logic depends on it and no resulting values are ever used (e.g. triggering signed overflow, but never using the value because the overflow was detected via well-defined behavior).

Originally I wrote this code allowing weak but not strong undefined behavior, which is an invariant I'm fairly certain still holds. I took this approach because it allowed me to remove numerous branches in the most common code paths.

Since we started running ubsan I've been approving patches that remove several of these occurrences of weak undefined behavior. However, I am a bit concerned that we're just hurting performance and increasing code complexity while not actually fixing any real bugs. Because I'm not aware of any compilers that would break the code on weak undefined behavior, and in fact I would argue that such a thing would represent a compiler bug.

Here's an alternative proposal: If my assumptions about weak undefined behavior are reasonable and we can rely on the compiler doing what we expect in these cases--which the current test coverage verifies--how about we just keep the optimizations and annotate these locations to silence the tools?
From a language lawyer perspective, "weak undefined behavior" isn't. This is similar to "benign races" (https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong): Compilers optimize in creative and unpredictable ways (see e.g.  bug 669642  where some undefined behavior was harmless in practice until last week, when it broke exclusively on chrome/android/clang).

From a pragmatic perspective, we have all kinds of undefined behavior in chrome, most of it unintentionally. But some things are intentional, we for example build with -fno-strict-aliasing which disables some optimizations that will only break code with undefined behavior (but casting memory is impossible without undefined behavior, bit_cast<>() is not always convenient to use, and the union aliasing technique is C-only and even there not fully standardized afaik, so doing the Right Thing there depends on the standard deciding on what the Right Thing is), we deref 0 to force crashes in a few places, etc.

So I think what we want to do is use UBSan to find undefined behavior, start with a small list of undefined behavior, and then in priority order add more and more checks as we clean up the codebase. If we find checks that require unreasonable fixes, we could decide to not turn them on (and then I suppose give feedback to the C++ standards folks in some form).

It sounds like you're claiming that the double->float casts are in that category. It's possible that that's true. http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html lists only fsanitize=float-cast-overflow which covers both float->int (which we know can cause miscompiles) and double->float, so I think we'd want to split that flag in two in ubsan, and then turn on only one of the two (at least for now) and collect data to make a case that maybe double->float shouldn't be undefined or something.

But just leaving undefined behavior and hoping for the best guarantees that that code will break one day.
You kinda danced around my question, and the comparisons to race conditions aren't aren't really valid because all of the examples are clearly not benign, and are in fact relying on the result of undefined behavior.

So, back to my question, I agree that we should not have any code that relies directly on undefined behavior. However, we keep adding additional code and complexity here where we do not actually rely directly on the result of undefined behavior. Taking the case of the conversion from an out-of-bounds float to an integral, what's the harm if the value is never used because well-defined behavior has determined it would be invalid? Why can't we just use the simpler, more performant approach of allowing the operation unconditionally, annotating it, and ensuring we have test coverage? If the compiler removes the undefined conversion when it detects it--which seems like the most extreme thing it could do--then isn't that all the better, since we were never going to use that value anyway?

> Taking the case of the conversion from an out-of-bounds float to an integral, what's the harm if the value is never used because well-defined behavior has determined it would be invalid?

The harm in that case was that the compiler figured that since an out-of-bounds float is undefined behavior it can assume that the result will fix in a byte (if the integral was an uint8_t), and then it packed that byte together with the is_valid byte into an uint16_t, and when the out-of-bounds float got written to that uint16_t, the bits of the float larger than 8 byte spilled into the 8 byte of the is_valid field, which broke things.

For double->float conversions we can't think of anything like this yet, but something similarly abstruse could happen later. I'd wager that we don't have a ton of CheckedNumeric<float>, so I'm not sure handling this correctly would be very expensive. (We likely have a ton of static_cast<float>(my_double) and all of that is technically wrong but works in practice -- maybe doubles happen to be smaller than FLT_MAX in practice, or compilers don't exploit this type of undefined behavior yet).
> The harm in that case was that the compiler figured that since an out-of-bounds float is undefined behavior it can assume that the result will fix in a byte (if the integral was an uint8_t), and then it packed that byte together with the is_valid byte into an uint16_t, and when the out-of-bounds float got written to that uint16_t, the bits of the float larger than 8 byte spilled into the 8 byte of the is_valid field, which broke things.

All of the inappropriate spillage is why compiler devs never get invited to the good parties.

Comment 8 by thakis@chromium.org, Dec 21 2016

Is this done?

Comment 9 by jsc...@chromium.org, Dec 22 2016

I don't know. I fixed the obvious floating point thing, but you cited "a bunch" in the title, and I'm not sure where that measurement comes from.
Running base_unittests with fsanitize=float-cast-overflow enabled. I'll try again and see how it goes.
Components: -Security Internals>BaseNumerics
Status: Fixed (was: Untriaged)
Gonna assume I cleaned this up sufficiently.

Sign in to add a comment