lxc-start failing in HWTest for electro and basking |
|||||||
Issue descriptionSince Nov 22 (two days ago) HWTest for electro and basking are failing when they run their only AutoTest server test: provision_AutoUpdate.double. See failure history here: https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin I've temporarily disabled CQ testing of electro and basking until this is resolved so this is urgent. The failure is long before autotest is even downloaded to the LXC container: 11/24 08:18:58.955 DEBUG| utils:0212| Running 'sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d' 11/24 08:19:04.988 DEBUG| utils:0212| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state' 11/24 08:19:05.781 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/base_09/container_id.p"' 11/24 08:19:06.423 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/ssh/container_id.p"' 11/24 08:19:07.070 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157447011_1511476601_7889/container_id.p"' 11/24 08:19:07.731 DEBUG| utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157447011_1511476601_7889/container_id.p' 11/24 08:19:08.398 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157671071_1511295376_8966/container_id.p"' 11/24 08:19:09.047 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157671139_1511295376_8969/container_id.p"' 11/24 08:19:09.701 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157866481_1511475698_23835/container_id.p"' 11/24 08:19:10.370 DEBUG| utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157866481_1511475698_23835/container_id.p' 11/24 08:19:11.043 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157866655_1511477757_9447/container_id.p"' 11/24 08:19:11.730 DEBUG| utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157866655_1511477757_9447/container_id.p' 11/24 08:19:12.397 DEBUG| utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157867245_1511493564_15798/container_id.p"' 11/24 08:19:13.083 DEBUG| utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157867245_1511493564_15798/container_id.p' ... 11/24 08:20:24.068 DEBUG| container:0368| Destroying container /usr/local/autotest/containers/test_158364604_1511540257_21263 11/24 08:20:24.070 DEBUG| utils:0212| Running 'sudo lxc-destroy -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -f' 11/24 08:20:26.831 INFO | server_job:0213| FAIL ---- ---- timestamp=1511540426 localtime=Nov 24 08:20:26 Failed to setup container for test: Command <sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d> failed, rc=1, Command returned non-zero exit status * Command: sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d Exit status: 1 Duration: 5.97342014313 stderr: lxc-start: tools/lxc_start.c: main: 366 The container failed to start. lxc-start: tools/lxc_start.c: main: 368 To get more details, run the container in foreground mode. lxc-start: tools/lxc_start.c: main: 370 Additional information can be obtained by setting the --logfile and --logpriority options.. Check logs in ssp_logs folder for more details. 11/24 08:20:26.833 DEBUG| utils:0212| Running 'sudo -n chown -R 99158 "/usr/local/autotest/results/158364604-chromeos-test/chromeos6-row3-rack12-host13"' 11/24 08:20:27.483 DEBUG| utils:0212| Running 'sudo -n chgrp -R 5000 "/usr/local/autotest/results/158364604-chromeos-test/chromeos6-row3-rack12-host13"' 11/24 08:20:28.124 ERROR| traceback:0013| Traceback (most recent call last): 11/24 08:20:28.125 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 507, in run_autoserv 11/24 08:20:28.126 ERROR| traceback:0013| machines) 11/24 08:20:28.126 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 168, in _run_with_ssp 11/24 08:20:28.127 ERROR| traceback:0013| dut_name=dut_name) 11/24 08:20:28.127 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper 11/24 08:20:28.128 ERROR| traceback:0013| return fn(*args, **kwargs) 11/24 08:20:28.129 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail 11/24 08:20:28.129 ERROR| traceback:0013| return func(*args, **kwargs) 11/24 08:20:28.130 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container_bucket.py", line 186, in setup_test 11/24 08:20:28.130 ERROR| traceback:0013| container.start(wait_for_network=True) 11/24 08:20:28.131 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper 11/24 08:20:28.131 ERROR| traceback:0013| return fn(*args, **kwargs) 11/24 08:20:28.132 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container.py", line 319, in start 11/24 08:20:28.133 ERROR| traceback:0013| output = utils.run(cmd).stdout 11/24 08:20:28.133 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/utils.py", line 738, in run 11/24 08:20:28.135 ERROR| traceback:0013| "Command returned non-zero exit status") 11/24 08:20:28.136 ERROR| traceback:0013| CmdError: Command <sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d> failed, rc=1, Command returned non-zero exit status 11/24 08:20:28.136 ERROR| traceback:0013| * Command: 11/24 08:20:28.136 ERROR| traceback:0013| sudo lxc-start -P /usr/local/autotest/containers -n 11/24 08:20:28.137 ERROR| traceback:0013| test_158364604_1511540257_21263 -d 11/24 08:20:28.137 ERROR| traceback:0013| Exit status: 1 11/24 08:20:28.137 ERROR| traceback:0013| Duration: 5.97342014313 11/24 08:20:28.138 ERROR| traceback:0013| 11/24 08:20:28.138 ERROR| traceback:0013| stderr: 11/24 08:20:28.138 ERROR| traceback:0013| lxc-start: tools/lxc_start.c: main: 366 The container failed to start. 11/24 08:20:28.139 ERROR| traceback:0013| lxc-start: tools/lxc_start.c: main: 368 To get more details, run the container in foreground mode. 11/24 08:20:28.139 ERROR| traceback:0013| lxc-start: tools/lxc_start.c: main: 370 Additional information can be obtained by setting the --logfile and --logpriority options. 11/24 08:20:28.164 INFO | client:0570| Attempting refresh to obtain initial access_token 11/24 08:20:28.252 INFO | client:0872| Refreshing access_token 11/24 08:20:28.757 ERROR| autoserv:0759| Uncaught SystemExit with code 1 Traceback (most recent call last): File "/usr/local/autotest/server/autoserv", line 755, in main use_ssp) File "/usr/local/autotest/server/autoserv", line 562, in run_autoserv sys.exit(exit_code) SystemExit: 1 11/24 08:20:28.968 DEBUG| logging_manager:0627| Logging subprocess finished 11/24 08:20:28.968 DEBUG| logging_manager:0627| Logging subprocess finished Looking through crbug history, it looks like we have a long history of issues with the LXC containers. I couldn't find any more debugging information about what might be happening during lxc-start--those logs seem to be missing. Full log: https://00e9e64bac617124db3b03f056fa0a1f30d6eb2437ed5ae327-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/158364604-chromeos-test%2Fchromeos6-row3-rack12-host13%2Fssp_logs%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEu6fkUrT1AdsIlJKeFlZ2Z7TQnVNadZWOb1C7wrm3RADF88esaeBV6_fzHy1mINBuoRe4sbCl9rI_sK_pfRnIifMAP59OxmvsAaC6TNNGplP1vWBOKyzPbY3Xd89MD3195SqISKq31B1Vn7FgKl6J3kD3mooPt-VilC3yWSjrpWxbc4ussyQVuPM6sQCAoZm4N_Xwi65cZ4qWRsFHPLnGo3ezvN8oWWBSwJVK7nDFRZz79LV4Qj87welXBD8j29nGXLZEgZapDWeJh1DuZ7BtNLzJKMnbfAQBZHv08cfAA1SvSend38D3lZTC--KwhpNJnDPBO8g_9mcdEZlPbDUCQOYTkngulwvzQw-5cDn8hNhsxzEn8IH7uS_fUAd41LqVU0fOx5U96qV6dJergBlpHtonlil9dXGRESqPVnLyljAraeAFZlVKvZ3rC0vxMkv2YqKL3wxCMasCBLfxBwZrPUBRwsdfAGcW4skexiIqhsLeTatTDYRhgXehYRwJxNGC-_pTuNthy7EIswQbMyOVCjpdtU48BvdfiPsm4q63_XNDz0ik8B9J49vtMQUMgyMxWN031TYbRs5iN9Jx0upsCn9p5ioHLsxU4v_bzB8Hpnf5wmoAIUVw_Sez9JJm08usKx9gw2u0CDyITh2hx2fBWO8TqU5br00jHUWTm7UvzZFYqX9LC14HAC-YGll8xO7Z_jZLDKeixc3NuekUPawfMoRkZqmTECxSg2l6YI3TexqxQmdJ7jg-UaRk0xaNI0eQTPMf-JLTDKN_ZCxqq2IhotAPVQTcFjn3lYBio7YUd7Veq9wR3DDwjewcqZ0H9fP07PuhciVrRDkbb6pt7U4DY-857hYGCDBYvwoKmqkWZyqfkd95E
,
Nov 25 2017
,
Nov 25 2017
,
Nov 26 2017
Lab will be closed next Monday, why don't we restart the server? Assign this to deputy please.
,
Nov 27 2017
This one looks quite similar as well: https://bugs.chromium.org/p/chromium/issues/detail?id=788595 .
,
Nov 27 2017
And on ultima here, as well: https://luci-milo.appspot.com/buildbot/chromeos/ultima-release/1712 https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/158819963-chromeos-test/chromeos4-row11-rack5-host19/ssp_logs/debug/ . Assigning to deputy per #4.
,
Nov 27 2017
,
Nov 27 2017
Issue 788595 has been merged into this issue.
,
Nov 27 2017
Issue 787936 has been merged into this issue.
,
Nov 27 2017
server119.mtv is restarted. chromeos-test@chromeos-server119:/usr/local/autotest$ python site_utils/lxc/lxc_functional_test.py -v -s ... ... 2017-11-27 10:16:29,039.039 INFO |lxc_functional_tes:0354| MainThread(139654776600384)| All tests passed. This lxc issue should be fixed. Wait for next round of reef-paladin to verify.
,
Nov 27 2017
,
Nov 27 2017
Issue 788835 has been merged into this issue.
,
Nov 28 2017
Latest reef-paladin/release look good. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jclinton@chromium.org
, Nov 25 2017