New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 776369 link

Starred by 1 user

Issue metadata

Status: Closed
Owner: ----
Closed: Aug 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 879206



Sign in to add a comment

kahlee unit tests failing even for unrelated CLs

Project Member Reported by groeck@chromium.org, Oct 19 2017

Issue description

https://chromium-review.googlesource.com/#/q/comment:quipper+comment:libbrillo

shows kahlee unit tests failing for various CLs even if the CL is completely unrelated to kahlee.

From error logs:

quipper-0.0.1-r2021: Error: /var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test: failed with signal SIGIOT|SIGABRT(6)
quipper-0.0.1-r2021:  * ERROR: chromeos-base/quipper-0.0.1-r2021::chromiumos failed (test phase):
quipper-0.0.1-r2021:  *   (no error message)
quipper-0.0.1-r2021:  * 
quipper-0.0.1-r2021:  * Call stack:
quipper-0.0.1-r2021:  *     ebuild.sh, line   93:  Called src_test
quipper-0.0.1-r2021:  *   environment, line 3608:  Called platform_src_test
quipper-0.0.1-r2021:  *   environment, line 3206:  Called platform_pkg_test
quipper-0.0.1-r2021:  *   environment, line 3188:  Called platform_test 'run' '/build/kahlee/var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test' '1'
quipper-0.0.1-r2021:  *   environment, line 3239:  Called die

libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(142)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range
libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(147)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range
libbrillo-0.0.1-r936: [ERROR:stream_utils_unittest.cc(151)] TestBody(...): Domain=stream.io, Code=invalid_parameter, Message=The stream offset value is out of range

[ and many similar ]

libbrillo-0.0.1-r936: [ERROR:message.cc(902)] Failed to parse protocol buffer from array
libbrillo-0.0.1-r936: [ERROR:dbus_method_invoker_unittest.cc(104)] Unexpected method call: message_type: MESSAGE_METHOD_CALL
libbrillo-0.0.1-r936: interface: org.test.Object.TestInterface
libbrillo-0.0.1-r936: member: TestMethod3
libbrillo-0.0.1-r936: signature: ay
libbrillo-0.0.1-r936: 
libbrillo-0.0.1-r936: array [
libbrillo-0.0.1-r936:   byte 8
libbrillo-0.0.1-r936:   byte 123
libbrillo-0.0.1-r936:   byte 18
libbrillo-0.0.1-r936:   byte 3
libbrillo-0.0.1-r936:   byte 98
libbrillo-0.0.1-r936:   byte 97
libbrillo-0.0.1-r936:   byte 114
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936:   byte 0
libbrillo-0.0.1-r936: ]
libbrillo-0.0.1-r936: 
libbrillo-0.0.1-r936: [ERROR:dbus_method_invoker.h(117)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.Failed, Message=Failed to call D-Bus method: org.test.Object.TestInterface.TestMethod3
libbrillo-0.0.1-r936: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r936/work/libbrillo-0.0.1/platform2/libbrillo/brillo/dbus/dbus_method_invoker_unittest.cc:126: Failure
libbrillo-0.0.1-r936: Expected: (nullptr) != (response.get()), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL
libbrillo-0.0.1-r936: [FATAL:dbus_method_invoker.h(228)] Check failed: message. Unable to extract parameters from a NULL message.
libbrillo-0.0.1-r936: /usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7f534d015dd3]
libbrillo-0.0.1-r936: 
libbrillo-0.0.1-r936: Error: /var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests: failed with signal SIGIOT|SIGABRT(6)
libbrillo-0.0.1-r936:  * ERROR: chromeos-base/libbrillo-0.0.1-r936::chromiumos failed (test phase):
libbrillo-0.0.1-r936:  *   (no error message)
libbrillo-0.0.1-r936:  * 
libbrillo-0.0.1-r936:  * Call stack:
libbrillo-0.0.1-r936:  *     ebuild.sh, line   93:  Called src_test
libbrillo-0.0.1-r936:  *   environment, line 3647:  Called platform_src_test
libbrillo-0.0.1-r936:  *   environment, line 3231:  Called platform_pkg_test
libbrillo-0.0.1-r936:  *   environment, line 3211:  Called platform_test 'run' '/build/kahlee/var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests'
libbrillo-0.0.1-r936:  *   environment, line 3264:  Called die

Looking through my own logs, it appears that "cros tryjob ... kahlee-paladin kahlee-release" has been failing consistently since at least August.

Assigning to Infra - maybe kahlee-pre-cq should not be used for standard pre-cq runs if it is known to be failing. Please reassign if this is the wrong approach.

 

Comment 1 by groeck@chromium.org, Oct 19 2017

Cc: nxia@chromium.org sarthakkukreti@chromium.org xiaochu@chromium.org
Owner: nxia@chromium.org
Cc: jclinton@chromium.org
Cc: bmgordon@chromium.org

Comment 4 by nxia@chromium.org, Oct 19 2017

 groeck@, can you paste the link to the unit test failures? did the unit tests failed because of infra issues?

I see kahlee-paladin has been passing except some abort caused by CQ self-destruction.
https://uberchromegw.corp.google.com/i/chromeos/builders/kahlee-paladin

Comment 5 by nxia@chromium.org, Oct 19 2017

Cc: pprabhu@chromium.org
Owner: ----
Found an example:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62449/steps/UnitTest/logs/stdio



quipper-0.0.1-r2055: [libprotobuf FATAL ../../protobuf-3.3.0/src/google/protobuf/message_lite.cc:71] CHECK failed: (bytes_produced_by_serialization) == (byte_size_before_serialization): Byte size calculation and serialization were inconsistent.  This may indicate a bug in protocol buffers or it may be caused by concurrent modification of quipper.PerfDataProto.
quipper-0.0.1-r2055: Error: /var/cache/portage/chromeos-base/quipper/out/Default/perf_recorder_test: failed with signal SIGIOT|SIGABRT(6)
quipper-0.0.1-r2055:  * ERROR: chromeos-base/quipper-0.0.1-r2055::chromiumos failed (test phase):




libbrillo-0.0.1-r970: [ERROR:dbus_method_invoker.h(117)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.Failed, Message=Failed to call D-Bus method: org.test.Object.TestInterface.TestMethod3
libbrillo-0.0.1-r970: ../../../../../../../tmp/portage/chromeos-base/libbrillo-0.0.1-r970/work/libbrillo-0.0.1/platform2/libbrillo/brillo/dbus/dbus_method_invoker_unittest.cc:126: Failure
libbrillo-0.0.1-r970: Expected: (nullptr) != (response.get()), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL
libbrillo-0.0.1-r970: [FATAL:dbus_method_invoker.h(228)] Check failed: message. Unable to extract parameters from a NULL message.
libbrillo-0.0.1-r970: /usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7f9dee8fbdd3]
libbrillo-0.0.1-r970: 
libbrillo-0.0.1-r970: Error: /var/cache/portage/chromeos-base/libbrillo/out/Default/libbrillo-395517_unittests: failed with signal SIGIOT|SIGABRT(6)
libbrillo-0.0.1-r970:  * ERROR: chromeos-base/libbrillo-0.0.1-r970::chromiumos failed (test phase):
libbrillo-0.0.1-r970:  *   (no error message)





And I found some passed kahlee-pre-cq:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62483
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62514

and some failed kahlee-pre-cq:

https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62473

https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/62449


The unittests in kahlee-pre-cq are flaky, we should either find the owner from libbrillo and quipper to fix the unit tests, or remove the unit tests if necessary.


Comment 6 by nxia@chromium.org, Oct 19 2017

Labels: -Pri-2 Pri-1
mysql> select id, buildbucket_id, status from buildTable where build_config='kahlee-pre-cq' order by id desc limit 15;
+---------+---------------------+--------+
| id      | buildbucket_id      | status |
+---------+---------------------+--------+
| 1958878 | 8965337783442382464 | pass   |
| 1958401 | 8965347412157315936 | pass   |
| 1958305 | 8965350133319068256 | fail   |
| 1957982 | 8965356267548223920 | fail   |
| 1957603 | 8965364210016453216 | fail   |
| 1957299 | 8965369315276431472 | pass   |
| 1957184 | 8965373200453561504 | pass   |
| 1956769 | 8965380703564618464 | fail   |
| 1956722 | 8965381676059009056 | pass   |
| 1954769 | 8965435768005023104 | fail   |
| 1954740 | 8965437503828380544 | pass   |
| 1953183 | 8965467688820681664 | fail   |
| 1953154 | 8965468688681391808 | pass   |
| 1952888 | 8965478634065346816 | pass   |
| 1951414 | 8965522560900175648 | pass   |
+---------+---------------------+--------+


The failure rate is high, raising the priority

Comment 7 by nxia@chromium.org, Oct 19 2017

sheriffs@, can you help to find the owners for libbrillo and quipper 
Cc: snanda@chromium.org
Components: -Infra>Client>ChromeOS
Re OP, this is not in the default set of pre-cqs

It _is_ in the pre-cq list for kernel 4.4 (+snanda) which may widen its impact.
https://cs.corp.google.com/chromeos_public/src/third_party/kernel/v4.4/COMMIT-QUEUE.ini?type=cs&q=kahlee-pre-cq&sq=package:chromeos&l=14

Finally, this is not an infra bug.
Cc: vapier@chromium.org
Labels: OS-Chrome
Re #6: 
I'm not sure we need to be alarmed. We don't know how many of those are bad CLs.
I see one series of three failures. This should have triggered a ToT pre-cq build, which would have passed.

There isn't much we can do in infra in response to flakiness of this kind, but we can make it more visible sometimes...
This case is particularly hard because pre-cqs fail mostly because of bad CLs (that's their purpose) and we don't run enough tot-pre-cqs to infer flakiness from them.
Cc: chromeos-kahlee@google.com
kahlee uses the 4.12 kernel, via chipset-stnyridge:

chipset-stnyridge/profiles/base/make.defaults:USE="${USE} kernel-4_12"

Maybe drop from COMMIT-QUEUE.ini for chromeos-4.4 and add to chromeos-4.12 ?

quipper has a team we can file a bug for.  not sure if they use crbug.com or something else, but we can throw something over their direction.

libbrillo is owned by the CrOS team ;).
CL:728366 for dropping kahlee-pre-cq from COMMIT-QUEUE.ini in chromeos-4.4.


I'll take care of COMMIT-QUEUE.ini, but who should own the unit test flakiness ? Any suggestions for compoment(s) ?

Project Member

Comment 16 by bugdroid1@chromium.org, Oct 20 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d222098e5241ecc88e1c939fc884b1bbcc65b2bd

commit d222098e5241ecc88e1c939fc884b1bbcc65b2bd
Author: Guenter Roeck <groeck@chromium.org>
Date: Fri Oct 20 04:15:26 2017

COMMIT-QUEUE.ini: Drop kahlee-pre-cq

kahlee uses chromeos-4.12. Its unit tests are flaky, causing unnecessary
pre-cq failures. Let's drop it from chromeos-4.4.

BUG= chromium:776369 
TEST=Run pre-cq on chromeos-4.4 patches

Change-Id: Ib09799cf46adea3506c1b22048156988b3d5e046
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/728366
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Justin TerAvest <teravest@chromium.org>

[modify] https://crrev.com/d222098e5241ecc88e1c939fc884b1bbcc65b2bd/COMMIT-QUEUE.ini

Components: Test

Comment 18 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org
Components: -Test Infra>Client>ChromeOS>Build
I see a failure on the CQ overnight on a different board (grunt) that looks very similar to this one: https://luci-milo.appspot.com/buildbot/chromeos/grunt-paladin/2668

Owner: sque@chromium.org
@20 - the grunt-paladin quipper failure is tracked in  crbug.com/879206 
Owner: ----
Status: Closed (was: Untriaged)
kahlee builder was removed this bug is techinical obsolete.
Blocking: 879206

Sign in to add a comment