cidb_integration_test test flake |
|||||||||||||
Issue descriptionWe now run cidb_integration_test as part of the PreCQ for chromite changes. Mostly... it works, but this looks like a flaky test failure that doesn't make any sense to me. https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/59577 Test Insert and Get operations on hwTestResultTable. ... ok ====================================================================== ERROR: [chromite.lib.cidb_integration_test] CIDBMigrationsTest.testActions Test that InsertCLActions accepts 0-, 1-, and multi-item lists. ---------------------------------------------------------------------- Traceback (most recent call last): File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 262, in _stacked_setUp StackedSetup._stacked_tearDown(obj) File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 278, in _stacked_tearDown target(obj) File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1072, in tearDown None) File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/mnt/host/source/chromite/lib/parallel.py", line 750, in BackgroundTaskRunner queue.put(_AllTasksComplete()) File "/mnt/host/source/chromite/lib/parallel.py", line 747, in BackgroundTaskRunner yield queue TypeError: __init__() takes at least 3 arguments (2 given)
,
Oct 11 2017
reassign and ->P1 if recurs
,
Oct 11 2017
I only saw it once, but was worried because it happened on one of the first production PreCQ builds I saw that ran the test.
,
Oct 14 2017
.
======================================================================
ERROR: [chromite.lib.cidb_integration_test] DataSeries1Test.runTest
Simulate a single set of canary builds with database schema v56.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 263, in _stacked_setUp
StackedSetup._stacked_tearDown(obj)
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 279, in _stacked_tearDown
target(obj)
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1073, in tearDown
None)
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/mnt/host/source/chromite/lib/parallel.py", line 751, in BackgroundTaskRunner
queue.put(_AllTasksComplete())
File "/mnt/host/source/chromite/lib/parallel.py", line 748, in BackgroundTaskRunner
yield queue
TypeError: __init__() takes at least 3 arguments (2 given)
----------------------------------------------------------------------
Ran 25 tests in 167.227s
FAILED (errors=1, skipped=1)
sometimes I can reproduce it locally
,
Oct 18 2017
Again: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromiumos.tryserver%2Fpre_cq%2F62180%2F%2B%2Frecipes%2Fsteps%2FCidbIntegrationTest%2F0%2Fstdout I know what is happening here. I wrote this bit a while ago.
,
Oct 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/fb7c09382221cd6d103ef331f138dacd528ba545 commit fb7c09382221cd6d103ef331f138dacd528ba545 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Thu Oct 19 02:08:26 2017 cros_test_lib: Use addCleanup in TempDirTestCase. unittest.TestCase.addCleanup registered functions run after unittest.TestCase.tearDown. TempDirTestCase needs to use addCleanup so that users of this test case can safely use addCleanup assuming that the tempdir provided will not be cleaned up until after these cleanup functions have run. BUG=chromium:771785 TEST=cbuildbot/run_tests Change-Id: Ibf84cf16ed2b2d1e29bc8e787d17329480b7ee59 Reviewed-on: https://chromium-review.googlesource.com/726310 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/fb7c09382221cd6d103ef331f138dacd528ba545/lib/cros_test_lib.py
,
Oct 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/386964f9720cff98d9fc9e105b41e53b2d18d271 commit 386964f9720cff98d9fc9e105b41e53b2d18d271 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Oct 25 00:36:31 2017 cidb: Fix integration test case failure mode. One of the failure modes of cidb_integration_test was being reported badly due to a bug in the way we created the resulting exception. This CL does not fix the actual problem (probably timing related), but fixes the incorrect reporting. BUG=chromium:771785 TEST=cidb_integration_test; force bad cases. Change-Id: I44b177f3e3f43f5501124fad889cbe116ee7cc20 Reviewed-on: https://chromium-review.googlesource.com/726522 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/386964f9720cff98d9fc9e105b41e53b2d18d271/lib/cros_test_lib.py
,
Oct 25 2017
We now wait for this to strike again with the real failure reason :)
,
Nov 3 2017
Got a failure
======================================================================
ERROR: [chromite.lib.cidb_integration_test] BuildTableTest.testBuildbucketId
Test InsertBuild with buildbucket_id.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1053, in <lambda>
'mysqladmin failed to ping mysqld: %s' % e))
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1084, in _CleanupMysqld
None,
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/mnt/host/source/chromite/lib/parallel.py", line 751, in BackgroundTaskRunner
queue.put(_AllTasksComplete())
File "/mnt/host/source/chromite/lib/parallel.py", line 748, in BackgroundTaskRunner
yield queue
Exception: mysqladmin failed to ping mysqld: return code: 1; command: mysqladmin -S /tmp/chromite.testd2SH5F/mysqld_dir/mysqld.socket ping
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/tmp/chromite.testd2SH5F/mysqld_dir/mysqld.socket' (2)'
Check that mysqld is running and that the socket: '/tmp/chromite.testd2SH5F/mysqld_dir/mysqld.socket' exists!
cmd=['mysqladmin', '-S', '/tmp/chromite.testd2SH5F/mysqld_dir/mysqld.socket', 'ping']. We force killed the mysqld process.
,
Nov 3 2017
The ping command failed after 5 times retries. Checked the log, the first couple 'ping' often failed and got retired for other passed test. probably we can add the sleep time between the 'ping' retries, and add sleep time after a 'shutdown' operation. 171103 15:40:49 InnoDB: Shutdown completed; log sequence number 2046640 171103 15:40:49 [Note] /usr/sbin/mysqld: Shutdown complete .15:40:49: DEBUG: RunCommand: /usr/share/mysql/scripts/mysql_install_db --no-defaults '--basedir=/usr' '--ldata=/tmp/chromite.testXSY5hV/mysqld_dir' 15:40:49: DEBUG: RunCommand: mysqladmin -S /tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket ping 15:40:49: WARNING: Command failed with retriable error. return code: 1; command: mysqladmin -S /tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket ping mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket' (2)' Check that mysqld is running and that the socket: '/tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket' exists! cmd=['mysqladmin', '-S', '/tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket', 'ping'] 15:40:49: DEBUG: <class 'chromite.lib.cros_build_lib.RunCommandError'>(return code: 1; command: mysqladmin -S /tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket ping mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket' (2)' Check that mysqld is running and that the socket: '/tmp/chromite.testXSY5hV/mysqld_dir/mysqld.socket' exists!
,
Nov 4 2017
We already have retries for ping with a backoff factor of 1.5: https://cs.corp.google.com/chromeos_public/chromite/lib/cros_test_lib.py?l=1049 Also, we wait for the mysqld process to end before continuing after we run 'mysqladmin shutdown', so we're actually sure mysqld is gone before we continue. On a loaded server, even 1 + 1*1.5 + 1* 1.5^2 + 1* 1.5^3 + 1* 1.5^4 = 13.1875 is not enough. Two points here: - My backoff factor is stupid. It should at least be 2 . Without any backoff factor, the timeout would have been : 1 + 2 + 3 + 4 + 5 = 15 - Such timeouts tend to make for flaky tests on the builders. We should increase the timeout to practially mean "forever". Let's make it backoff_factor = 2, # attempts = 8, i.e 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 = 255s total timeout. If the mysql server doesn't come up for 4 minutes, we can safely fail the test (oh, come on!) and it's not prohibitively long for the builders to hang for. https://chromium-review.googlesource.com/c/chromiumos/chromite/+/754380
,
Nov 4 2017
My guess is that if the server doesn't want to come up within 15 seconds, it probably won't come up in 4 minutes either. If it's hung, it's hung. Instead of a backoff-and-wait strategy, what about a kill-it-and-try-again one?
,
Nov 4 2017
Re #12: 12 seconds isn't nearly good enough on a loaded builder. This reminds of the flaky EC test we've marked expectedFailure recently. Basically reasonable timeout assumptions don't work on the builders. - unittests should mock out tests - integration tests (like this one), should keep a super long timeout that practically means "wait forever". We could retry at the test level rather than in the fixture.
,
Nov 4 2017
*unittests should mock out time.
,
Nov 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/7ba82062bd6b5f1146c8bc82d2f9cd13d482ceaf commit 7ba82062bd6b5f1146c8bc82d2f9cd13d482ceaf Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed Nov 08 23:10:15 2017 cidb_integration_test: Wait longer for mysqld to initialize. BUG=chromium:771785 TEST=cidb_integration_test Change-Id: I033516d8400e16858c492c080cb546fa8c490160 Reviewed-on: https://chromium-review.googlesource.com/754380 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Paul Hobbs <phobbs@google.com> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/7ba82062bd6b5f1146c8bc82d2f9cd13d482ceaf/lib/cros_test_lib.py
,
Nov 9 2017
Speculative 'Fixed'. Reopen if re-seen.
,
Nov 18 2017
Re-seen. https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/pre_cq/68788 Traceback (most recent call last): File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 259, in _stacked_setUp target(obj) File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1054, in setUp sleep=1, backoff_factor=2) File "/mnt/host/source/chromite/lib/retry_util.py", line 353, in RunCommandWithRetries return RetryCommand(cros_build_lib.RunCommand, max_retry, *args, **kwargs) File "/mnt/host/source/chromite/lib/retry_util.py", line 325, in RetryCommand return GenericRetry(ShouldRetry, max_retry, functor, *args, **kwargs) File "/mnt/host/source/chromite/lib/retry_util.py", line 244, in GenericRetry return _run() File "/mnt/host/source/chromite/lib/retry_util.py", line 177, in _Wrapper ret = func(*args, **kwargs) File "/mnt/host/source/chromite/lib/retry_util.py", line 243, in _run return functor(*args, **kwargs) File "/mnt/host/source/chromite/lib/cros_build_lib.py", line 654, in RunCommand raise RunCommandError(msg, cmd_result) RunCommandError: return code: 1; command: mysqladmin -S /tmp/chromite.testfgib1P/mysqld_dir/mysqld.socket ping mysqladmin: connect to server at 'localhost' failed
,
Jan 12 2018
,
Jan 12 2018
This is confusing.
It looks like it was running a test twice (in parallel?)
======================================================================
ERROR: [chromite.lib.cidb_integration_test] CIDBAPITest.testGetTime
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 259, in _stacked_setUp
target(obj)
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1074, in setUp
sleep=1, backoff_factor=2)
File "/mnt/host/source/chromite/lib/retry_util.py", line 353, in RunCommandWithRetries
return RetryCommand(cros_build_lib.RunCommand, max_retry, *args, **kwargs)
File "/mnt/host/source/chromite/lib/retry_util.py", line 325, in RetryCommand
return GenericRetry(ShouldRetry, max_retry, functor, *args, **kwargs)
File "/mnt/host/source/chromite/lib/retry_util.py", line 244, in GenericRetry
return _run()
File "/mnt/host/source/chromite/lib/retry_util.py", line 177, in _Wrapper
ret = func(*args, **kwargs)
File "/mnt/host/source/chromite/lib/retry_util.py", line 243, in _run
return functor(*args, **kwargs)
File "/mnt/host/source/chromite/lib/cros_build_lib.py", line 654, in RunCommand
raise RunCommandError(msg, cmd_result)
RunCommandError: return code: 1; command: mysqladmin -S /tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket ping
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket' (2)'
Check that mysqld is running and that the socket: '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket' exists!
cmd=['mysqladmin', '-S', '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket', 'ping']
======================================================================
ERROR: [chromite.lib.cidb_integration_test] CIDBAPITest.testGetTime
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1077, in <lambda>
'mysqladmin failed to ping mysqld: %s' % e))
File "/mnt/host/source/chromite/lib/cros_test_lib.py", line 1108, in _CleanupMysqld
None,
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/mnt/host/source/chromite/lib/parallel.py", line 751, in BackgroundTaskRunner
queue.put(_AllTasksComplete())
File "/mnt/host/source/chromite/lib/parallel.py", line 748, in BackgroundTaskRunner
yield queue
Exception: mysqladmin failed to ping mysqld: return code: 1; command: mysqladmin -S /tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket ping
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket' (2)'
Check that mysqld is running and that the socket: '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket' exists!
cmd=['mysqladmin', '-S', '/tmp/chromite.testaW3cYD/mysqld_dir/mysqld.socket', 'ping']. We force killed the mysqld process.
----------------------------------------------------------------------
Ran 27 tests in 440.649s
FAILED (errors=2, skipped=1)
The two errors are the same test failing twice...
,
Mar 30 2018
,
Mar 30 2018
,
May 8 2018
No longer the right team ownership. Onto triage queue for CI.
,
May 8 2018
Looks like a good starter bug. Will find owner.
,
Jun 8 2018
,
Aug 2
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by akes...@chromium.org
, Oct 11 2017