kahlee unit tests failing even for unrelated CLs |
||||||||||||||
Issue descriptionhttps://chromium-review.googlesource.com/#/q/comment:quipper+comment:libbrillo shows kahlee unit tests failing for various CLs even if the CL is completely unrelated to kahlee. From error logs: quipper-0.0.1-r2021: Error: /var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test: failed with signal SIGIOT|SIGABRT(6) quipper-0.0.1-r2021: * ERROR: chromeos-base/quipper-0.0.1-r2021::chromiumos failed (test phase): quipper-0.0.1-r2021: * (no error message) quipper-0.0.1-r2021: * quipper-0.0.1-r2021: * Call stack: quipper-0.0.1-r2021: * ebuild.sh, line 93: Called src_test quipper-0.0.1-r2021: * environment, line 3608: Called platform_src_test quipper-0.0.1-r2021: * environment, line 3206: Called platform_pkg_test quipper-0.0.1-r2021: * environment, line 3188: Called platform_test 'run' '/build/kahlee/var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test' '1' quipper-0.0.1-r2021: * environment, line 3239: Called die libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(142)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(147)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(151)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range [ and many similar ] libbrillo-0.0.1-r936: [ERROR:message.cc(902)] Failed to parse protocol buffer from array libbrillo-0.0.1-r936: [ERROR:dbus_method_invoker_unittest.cc(104)] Unexpected method call: message_type: MESSAGE_METHOD_CALL libbrillo-0.0.1-r936: interface: org.test.Object.TestInterface libbrillo-0.0.1-r936: member: TestMethod3 libbrillo-0.0.1-r936: signature: ay libbrillo-0.0.1-r936: libbrillo-0.0.1-r936: array [ libbrillo-0.0.1-r936: byte 8 libbrillo-0.0.1-r936: byte 123 libbrillo-0.0.1-r936: byte 18 libbrillo-0.0.1-r936: byte 3 libbrillo-0.0.1-r936: byte 98 libbrillo-0.0.1-r936: byte 97 libbrillo-0.0.1-r936: byte 114 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: byte 0 libbrillo-0.0.1-r936: ] libbrillo-0.0.1-r936: libbrillo-0.0.1-r936: [ERROR:dbus_method_invoker.h(117)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.Failed, Message=Failed to call D-Bus method: org.test.Object.TestInterface.TestMethod3 libbrillo-0.0.1-r936: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r936/work/libbrillo-0.0.1/platform2/libbrillo/brillo/dbus/dbus_method_invoker_unittest.cc:126: Failure libbrillo-0.0.1-r936: Expected: (nullptr) != (response.get()), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL libbrillo-0.0.1-r936: [FATAL:dbus_method_invoker.h(228)] Check failed: message. Unable to extract parameters from a NULL message. libbrillo-0.0.1-r936: /usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7f534d015dd3] libbrillo-0.0.1-r936: libbrillo-0.0.1-r936: Error: /var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests: failed with signal SIGIOT|SIGABRT(6) libbrillo-0.0.1-r936: * ERROR: chromeos-base/libbrillo-0.0.1-r936::chromiumos failed (test phase): libbrillo-0.0.1-r936: * (no error message) libbrillo-0.0.1-r936: * libbrillo-0.0.1-r936: * Call stack: libbrillo-0.0.1-r936: * ebuild.sh, line 93: Called src_test libbrillo-0.0.1-r936: * environment, line 3647: Called platform_src_test libbrillo-0.0.1-r936: * environment, line 3231: Called platform_pkg_test libbrillo-0.0.1-r936: * environment, line 3211: Called platform_test 'run' '/build/kahlee/var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests' libbrillo-0.0.1-r936: * environment, line 3264: Called die Looking through my own logs, it appears that "cros tryjob ... kahlee-paladin kahlee-release" has been failing consistently since at least August. Assigning to Infra - maybe kahlee-pre-cq should not be used for standard pre-cq runs if it is known to be failing. Please reassign if this is the wrong approach.
,
Oct 19 2017
,
Oct 19 2017
,
Oct 19 2017
groeck@, can you paste the link to the unit test failures? did the unit tests failed because of infra issues? I see kahlee-paladin has been passing except some abort caused by CQ self-destruction. https://uberchromegw.corp.google.com/i/chromeos/builders/kahlee-paladin
,
Oct 19 2017
Found an example: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62449/steps/UnitTest/logs/stdio quipper-0.0.1-r2055: [libprotobuf FATAL ../../protobuf-3.3.0/src/google/protobuf/message_lite.cc:71] CHECK failed: (bytes_produced_by_serialization) == (byte_size_before_serialization): Byte size calculation and serialization were inconsistent. This may indicate a bug in protocol buffers or it may be caused by concurrent modification of quipper.PerfDataProto. quipper-0.0.1-r2055: Error: /var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test: failed with signal SIGIOT|SIGABRT(6) quipper-0.0.1-r2055: * ERROR: chromeos-base/quipper-0.0.1-r2055::chromiumos failed (test phase): libbrillo-0.0.1-r970: [ERROR:dbus_method_invoker.h(117)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.Failed, Message=Failed to call D-Bus method: org.test.Object.TestInterface.TestMethod3 libbrillo-0.0.1-r970: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r970/work/libbrillo-0.0.1/platform2/libbrillo/brillo/dbus/dbus_method_invoker_unittest.cc:126: Failure libbrillo-0.0.1-r970: Expected: (nullptr) != (response.get()), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL libbrillo-0.0.1-r970: [FATAL:dbus_method_invoker.h(228)] Check failed: message. Unable to extract parameters from a NULL message. libbrillo-0.0.1-r970: /usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7f9dee8fbdd3] libbrillo-0.0.1-r970: libbrillo-0.0.1-r970: Error: /var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests: failed with signal SIGIOT|SIGABRT(6) libbrillo-0.0.1-r970: * ERROR: chromeos-base/libbrillo-0.0.1-r970::chromiumos failed (test phase): libbrillo-0.0.1-r970: * (no error message) And I found some passed kahlee-pre-cq: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62483 https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62514 and some failed kahlee-pre-cq: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62473 https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62449 The unittests in kahlee-pre-cq are flaky, we should either find the owner from libbrillo and quipper to fix the unit tests, or remove the unit tests if necessary.
,
Oct 19 2017
mysql> select id, buildbucket_id, status from buildTable where build_config='kahlee-pre-cq' order by id desc limit 15; +---------+---------------------+--------+ | id | buildbucket_id | status | +---------+---------------------+--------+ | 1958878 | 8965337783442382464 | pass | | 1958401 | 8965347412157315936 | pass | | 1958305 | 8965350133319068256 | fail | | 1957982 | 8965356267548223920 | fail | | 1957603 | 8965364210016453216 | fail | | 1957299 | 8965369315276431472 | pass | | 1957184 | 8965373200453561504 | pass | | 1956769 | 8965380703564618464 | fail | | 1956722 | 8965381676059009056 | pass | | 1954769 | 8965435768005023104 | fail | | 1954740 | 8965437503828380544 | pass | | 1953183 | 8965467688820681664 | fail | | 1953154 | 8965468688681391808 | pass | | 1952888 | 8965478634065346816 | pass | | 1951414 | 8965522560900175648 | pass | +---------+---------------------+--------+ The failure rate is high, raising the priority
,
Oct 19 2017
sheriffs@, can you help to find the owners for libbrillo and quipper
,
Oct 19 2017
Re OP, this is not in the default set of pre-cqs It _is_ in the pre-cq list for kernel 4.4 (+snanda) which may widen its impact. https://cs.corp.google.com/chromeos_public/src/third_party/kernel/v4.4/COMMIT-QUEUE.ini?type=cs&q=kahlee-pre-cq&sq=package:chromeos&l=14 Finally, this is not an infra bug.
,
Oct 19 2017
,
Oct 19 2017
Re #6: I'm not sure we need to be alarmed. We don't know how many of those are bad CLs. I see one series of three failures. This should have triggered a ToT pre-cq build, which would have passed. There isn't much we can do in infra in response to flakiness of this kind, but we can make it more visible sometimes... This case is particularly hard because pre-cqs fail mostly because of bad CLs (that's their purpose) and we don't run enough tot-pre-cqs to infer flakiness from them.
,
Oct 19 2017
kahlee uses the 4.12 kernel, via chipset-stnyridge:
chipset-stnyridge/profiles/base/make.defaults:USE="${USE} kernel-4_12"
Maybe drop from COMMIT-QUEUE.ini for chromeos-4.4 and add to chromeos-4.12 ?
,
Oct 19 2017
quipper has a team we can file a bug for. not sure if they use crbug.com or something else, but we can throw something over their direction. libbrillo is owned by the CrOS team ;).
,
Oct 19 2017
#4: I thought I did. The link I provided points to various failed CLs, which in turn point to failed builds. Here are some. https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/62449 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/62397 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/61612 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/62204 https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/61951
,
Oct 19 2017
CL:728366 for dropping kahlee-pre-cq from COMMIT-QUEUE.ini in chromeos-4.4.
,
Oct 19 2017
I'll take care of COMMIT-QUEUE.ini, but who should own the unit test flakiness ? Any suggestions for compoment(s) ?
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d222098e5241ecc88e1c939fc884b1bbcc65b2bd commit d222098e5241ecc88e1c939fc884b1bbcc65b2bd Author: Guenter Roeck <groeck@chromium.org> Date: Fri Oct 20 04:15:26 2017 COMMIT-QUEUE.ini: Drop kahlee-pre-cq kahlee uses chromeos-4.12. Its unit tests are flaky, causing unnecessary pre-cq failures. Let's drop it from chromeos-4.4. BUG= chromium:776369 TEST=Run pre-cq on chromeos-4.4 patches Change-Id: Ib09799cf46adea3506c1b22048156988b3d5e046 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/728366 Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Justin TerAvest <teravest@chromium.org> [modify] https://crrev.com/d222098e5241ecc88e1c939fc884b1bbcc65b2bd/COMMIT-QUEUE.ini
,
Dec 15 2017
,
Jun 8 2018
,
Aug 30
,
Aug 30
I see a failure on the CQ overnight on a different board (grunt) that looks very similar to this one: https://luci-milo.appspot.com/buildbot/chromeos/grunt-paladin/2668
,
Aug 30
,
Aug 30
kahlee builder was removed this bug is techinical obsolete.
,
Aug 30
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by groeck@chromium.org
, Oct 19 2017Owner: nxia@chromium.org