Issue metadata
Sign in to add a comment
|
grunt-paladin:2669 failed in UnitTest for quipper, bsdiff and libbrillo |
||||||||||||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of dzigterman@google.com grunt-paladin:2669 failed Builders failed on: - grunt-paladin: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936780095669332528 Also two other runs: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?builderName=grunt-paladin&buildNumber=2669 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2893842
,
Aug 30
,
Aug 30
We've seen flaky quipper unittests before: crbug.com/812425. Is quipper even maintained any more?
,
Aug 30
Actually build 2669 (and 2668 & 2670) all had three unittest failures: Packages failed: chromeos-base/quipper-0.0.1-r2413 dev-util/bsdiff-4.3.1-r16 chromeos-base/libbrillo-0.0.1-r1361 bsdiff-4.3.1-r16: cwd: /tmp/portage/dev-util/bsdiff-4.3.1-r16/work/bsdiff-4.3.1/platform2/bsdiff bsdiff-4.3.1-r16: cmd: {/var/cache/portage/dev-util/bsdiff/out/Default/bsdiff_unittest} '/var/cache/portage/dev-util/bsdiff/out/Default/bsdiff_unittest' bsdiff-4.3.1-r16: [0;32m[==========] [mRunning 66 tests from 14 test cases. bsdiff-4.3.1-r16: [0;32m[----------] [mGlobal test environment set-up. bsdiff-4.3.1-r16: [0;32m[----------] [m2 tests from BrotliCompressorTest bsdiff-4.3.1-r16: [0;32m[ RUN ] [mBrotliCompressorTest.BrotliCompressorSmoke bsdiff-4.3.1-r16: ERROR 08-30 10:51:58 ../../../../../../../tmp/portage/dev-util/bsdiff-4.3.1-r16/work/bsdiff-4.3.1/platform2/bsdiff/brotli_decompressor.cc:49: Decompressor reached EOF while reading from input stream. bsdiff-4.3.1-r16: ../../../../../../../tmp/portage/dev-util/bsdiff-4.3.1-r16/work/bsdiff-4.3.1/platform2/bsdiff/brotli_compressor_unittest.cc:33: Failure bsdiff-4.3.1-r16: Value of: brotli_decompressor.Read(decompressed_data.data(), sizeof(kHelloWorld)) bsdiff-4.3.1-r16: Actual: false bsdiff-4.3.1-r16: Expected: true bsdiff-4.3.1-r16: terminating with uncaught exception of type testing::internal::GoogleTestFailureException: ../../../../../../../tmp/portage/dev-util/bsdiff-4.3.1-r16/work/bsdiff-4.3.1/platform2/bsdiff/brotli_compressor_unittest.cc:33: Failure bsdiff-4.3.1-r16: Value of: brotli_decompressor.Read(decompressed_data.data(), sizeof(kHelloWorld)) bsdiff-4.3.1-r16: Actual: false bsdiff-4.3.1-r16: Expected: true bsdiff-4.3.1-r16: Error: /var/cache/portage/dev-util/bsdiff/out/Default/bsdiff_unittest: failed with signal SIGIOT|SIGABRT(6) bsdiff-4.3.1-r16: * ERROR: dev-util/bsdiff-4.3.1-r16::chromiumos failed (test phase): libbrillo-0.0.1-r1361: [0;32m[ RUN ] [mDBusUtils.Protobuf libbrillo-0.0.1-r1361: [ERROR:message.cc(907)] Failed to parse protocol buffer from array libbrillo-0.0.1-r1361: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r1361/work/libbrillo-0.0.1/libbrillo/brillo/dbus/data_serialization_unittest.cc:785: Failure libbrillo-0.0.1-r1361: Value of: PopValueFromReader(&reader, &test_message_out) libbrillo-0.0.1-r1361: Actual: false libbrillo-0.0.1-r1361: Expected: true libbrillo-0.0.1-r1361: [0;31m[ FAILED ] [mDBusUtils.Protobuf (0 ms) libbrillo-0.0.1-r1361: [0;32m[----------] [m33 tests from DBusUtils (78 ms total) libbrillo-0.0.1-r1361: libbrillo-0.0.1-r1361: [0;32m[----------] [m4 tests from DBusMethodInvokerTest libbrillo-0.0.1-r1361: [0;32m[ RUN ] [mDBusMethodInvokerTest.TestSuccess libbrillo-0.0.1-r1361: [0;32m[ OK ] [mDBusMethodInvokerTest.TestSuccess (1 ms) libbrillo-0.0.1-r1361: [0;32m[ RUN ] [mDBusMethodInvokerTest.TestFailure libbrillo-0.0.1-r1361: [ERROR:dbus_method_invoker.h(112)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.MyError, Message=My error message libbrillo-0.0.1-r1361: [0;32m[ OK ] [mDBusMethodInvokerTest.TestFailure (0 ms) libbrillo-0.0.1-r1361: [0;32m[ RUN ] [mDBusMethodInvokerTest.TestProtobuf libbrillo-0.0.1-r1361: [ERROR:message.cc(907)] Failed to parse protocol buffer from array libbrillo-0.0.1-r1361: [ERROR:dbus_method_invoker_unittest.cc(134)] Unexpected method call: message_type: MESSAGE_METHOD_CALL libbrillo-0.0.1-r1361: interface: org.test.Object.TestInterface libbrillo-0.0.1-r1361: member: TestMethod3 libbrillo-0.0.1-r1361: signature: ay libbrillo-0.0.1-r1361: libbrillo-0.0.1-r1361: array [ libbrillo-0.0.1-r1361: byte 8 libbrillo-0.0.1-r1361: byte 123 libbrillo-0.0.1-r1361: byte 18 libbrillo-0.0.1-r1361: byte 3 libbrillo-0.0.1-r1361: byte 98 libbrillo-0.0.1-r1361: byte 97 libbrillo-0.0.1-r1361: byte 114 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: byte 0 libbrillo-0.0.1-r1361: ] libbrillo-0.0.1-r1361: libbrillo-0.0.1-r1361: [ERROR:dbus_method_invoker.h(118)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.Failed, Message=Failed to call D-Bus method: org.test.Object.TestInterface.TestMethod3 libbrillo-0.0.1-r1361: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r1361/work/libbrillo-0.0.1/libbrillo/brillo/dbus/dbus_method_invoker_unittest.cc:156: Failure libbrillo-0.0.1-r1361: Expected: (nullptr) != (response.get()), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL libbrillo-0.0.1-r1361: [FATAL:dbus_method_invoker.h(232)] Check failed: message. Unable to extract parameters from a NULL message. libbrillo-0.0.1-r1361: /usr/lib64/libbase-core-395517.so(_ZN4base5debug10StackTraceC1Ev+0x13) [0x7f8bed29cea3] libbrillo-0.0.1-r1361: libbrillo-0.0.1-r1361: Error: /var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests: failed with signal SIGIOT|SIGABRT(6) libbrillo-0.0.1-r1361: * ERROR: chromeos-base/libbrillo-0.0.1-r1361::chromiumos failed (test phase):
,
Aug 30
Immediately before these failures was build 2668 which was aborted during BuildPackages due to an exception [0]. Prior to the aborted run, these unittests were all passing. Is it possible that this exception left the builder in a bad state such that these unittests now fail? Can we restart the builder or something to see if it recovers? [0] https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2892010
,
Aug 30
Mike volunteered to restart the builder.
,
Aug 30
A reboot is harmless but we shouldn't be reimaging builders: there's no immediately obvious technical way that corruption of the chroot can happen during an abort. We do not persist aborted builds disk states in to the next build in any way. On the other hand, reimaging has very negative impacts on the CQ and our bandwidth utilization. The only way there could be persistent impacts on a build from failed-to-failed is if the kernel was impacted or there was a chroot escape and a process is somehow still running (hasn't happened in a long time).
,
Aug 30
The issue appears to be related to this being AMD and it having some mixed results when executing on the GOLO phyiscal machines. The past success builds were all executed on cros-beefy-29-c2, which is a GCE instance, which suddenly went offline causing the GOLO builder to take over. I'm pushing this back to GCE to resolve and we're going to see about removing GOLO completely from the Grunt mix to prevent this until buildbot is completely shut down. -- Mike
,
Aug 30
This is issue 849137 again.
,
Aug 30
,
Aug 30
,
Aug 30
i think we've addressed this now by shifting the builder back to a cpu that's compatible with the ISA used by grunt/stonyridge. general follow ups are tracked in issue 856686.
,
Aug 30
We are waiting for the next CQ run to validate, but generally I agree that this should be resolved now with the GCE instance back as primary. -- Mike |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by dzigterman@chromium.org
, Aug 30