New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 5 users
Status: Verified
Owner:
Closed: Nov 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 788437



Sign in to add a comment
lxc-start failing in HWTest for electro and basking
Project Member Reported by jclinton@chromium.org, Nov 24 Back to list
Since Nov 22 (two days ago) HWTest for electro and basking are failing when they run their only AutoTest server test: provision_AutoUpdate.double. See failure history here:
https://uberchromegw.corp.google.com/i/chromeos/builders/reef-paladin

I've temporarily disabled CQ testing of electro and basking until this is resolved so this is urgent.

The failure is long before autotest is even downloaded to the LXC container:
11/24 08:18:58.955 DEBUG|             utils:0212| Running 'sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d'
11/24 08:19:04.988 DEBUG|             utils:0212| Running 'sudo lxc-ls -P /usr/local/autotest/containers -f -F name,state'
11/24 08:19:05.781 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/base_09/container_id.p"'
11/24 08:19:06.423 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/ssh/container_id.p"'
11/24 08:19:07.070 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157447011_1511476601_7889/container_id.p"'
11/24 08:19:07.731 DEBUG|             utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157447011_1511476601_7889/container_id.p'
11/24 08:19:08.398 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157671071_1511295376_8966/container_id.p"'
11/24 08:19:09.047 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157671139_1511295376_8969/container_id.p"'
11/24 08:19:09.701 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157866481_1511475698_23835/container_id.p"'
11/24 08:19:10.370 DEBUG|             utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157866481_1511475698_23835/container_id.p'
11/24 08:19:11.043 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157866655_1511477757_9447/container_id.p"'
11/24 08:19:11.730 DEBUG|             utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157866655_1511477757_9447/container_id.p'
11/24 08:19:12.397 DEBUG|             utils:0212| Running 'sudo test -e "/usr/local/autotest/containers/test_157867245_1511493564_15798/container_id.p"'
11/24 08:19:13.083 DEBUG|             utils:0212| Running 'sudo cat /usr/local/autotest/containers/test_157867245_1511493564_15798/container_id.p'
...
11/24 08:20:24.068 DEBUG|         container:0368| Destroying container /usr/local/autotest/containers/test_158364604_1511540257_21263
11/24 08:20:24.070 DEBUG|             utils:0212| Running 'sudo lxc-destroy -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -f'
11/24 08:20:26.831 INFO |        server_job:0213| FAIL	----	----	timestamp=1511540426	localtime=Nov 24 08:20:26	Failed to setup container for test: Command <sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d> failed, rc=1, Command returned non-zero exit status
  * Command: 
      sudo lxc-start -P /usr/local/autotest/containers -n
      test_158364604_1511540257_21263 -d
  Exit status: 1
  Duration: 5.97342014313
  
  stderr:
  lxc-start: tools/lxc_start.c: main: 366 The container failed to start.
  lxc-start: tools/lxc_start.c: main: 368 To get more details, run the container in foreground mode.
  lxc-start: tools/lxc_start.c: main: 370 Additional information can be obtained by setting the --logfile and --logpriority options.. Check logs in ssp_logs folder for more details.
11/24 08:20:26.833 DEBUG|             utils:0212| Running 'sudo -n chown -R 99158 "/usr/local/autotest/results/158364604-chromeos-test/chromeos6-row3-rack12-host13"'
11/24 08:20:27.483 DEBUG|             utils:0212| Running 'sudo -n chgrp -R 5000 "/usr/local/autotest/results/158364604-chromeos-test/chromeos6-row3-rack12-host13"'
11/24 08:20:28.124 ERROR|         traceback:0013| Traceback (most recent call last):
11/24 08:20:28.125 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 507, in run_autoserv
11/24 08:20:28.126 ERROR|         traceback:0013|     machines)
11/24 08:20:28.126 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 168, in _run_with_ssp
11/24 08:20:28.127 ERROR|         traceback:0013|     dut_name=dut_name)
11/24 08:20:28.127 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper
11/24 08:20:28.128 ERROR|         traceback:0013|     return fn(*args, **kwargs)
11/24 08:20:28.129 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail
11/24 08:20:28.129 ERROR|         traceback:0013|     return func(*args, **kwargs)
11/24 08:20:28.130 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/container_bucket.py", line 186, in setup_test
11/24 08:20:28.130 ERROR|         traceback:0013|     container.start(wait_for_network=True)
11/24 08:20:28.131 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper
11/24 08:20:28.131 ERROR|         traceback:0013|     return fn(*args, **kwargs)
11/24 08:20:28.132 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/container.py", line 319, in start
11/24 08:20:28.133 ERROR|         traceback:0013|     output = utils.run(cmd).stdout
11/24 08:20:28.133 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/utils.py", line 738, in run
11/24 08:20:28.135 ERROR|         traceback:0013|     "Command returned non-zero exit status")
11/24 08:20:28.136 ERROR|         traceback:0013| CmdError: Command <sudo lxc-start -P /usr/local/autotest/containers -n test_158364604_1511540257_21263 -d> failed, rc=1, Command returned non-zero exit status
11/24 08:20:28.136 ERROR|         traceback:0013| * Command: 
11/24 08:20:28.136 ERROR|         traceback:0013|     sudo lxc-start -P /usr/local/autotest/containers -n
11/24 08:20:28.137 ERROR|         traceback:0013|     test_158364604_1511540257_21263 -d
11/24 08:20:28.137 ERROR|         traceback:0013| Exit status: 1
11/24 08:20:28.137 ERROR|         traceback:0013| Duration: 5.97342014313
11/24 08:20:28.138 ERROR|         traceback:0013| 
11/24 08:20:28.138 ERROR|         traceback:0013| stderr:
11/24 08:20:28.138 ERROR|         traceback:0013| lxc-start: tools/lxc_start.c: main: 366 The container failed to start.
11/24 08:20:28.139 ERROR|         traceback:0013| lxc-start: tools/lxc_start.c: main: 368 To get more details, run the container in foreground mode.
11/24 08:20:28.139 ERROR|         traceback:0013| lxc-start: tools/lxc_start.c: main: 370 Additional information can be obtained by setting the --logfile and --logpriority options.
11/24 08:20:28.164 INFO |            client:0570| Attempting refresh to obtain initial access_token
11/24 08:20:28.252 INFO |            client:0872| Refreshing access_token
11/24 08:20:28.757 ERROR|          autoserv:0759| Uncaught SystemExit with code 1
Traceback (most recent call last):
  File "/usr/local/autotest/server/autoserv", line 755, in main
    use_ssp)
  File "/usr/local/autotest/server/autoserv", line 562, in run_autoserv
    sys.exit(exit_code)
SystemExit: 1
11/24 08:20:28.968 DEBUG|   logging_manager:0627| Logging subprocess finished
11/24 08:20:28.968 DEBUG|   logging_manager:0627| Logging subprocess finished

Looking through crbug history, it looks like we have a long history of issues with the LXC containers. I couldn't find any more debugging information about what might be happening during lxc-start--those logs seem to be missing.

Full log:
https://00e9e64bac617124db3b03f056fa0a1f30d6eb2437ed5ae327-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/158364604-chromeos-test%2Fchromeos6-row3-rack12-host13%2Fssp_logs%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEu6fkUrT1AdsIlJKeFlZ2Z7TQnVNadZWOb1C7wrm3RADF88esaeBV6_fzHy1mINBuoRe4sbCl9rI_sK_pfRnIifMAP59OxmvsAaC6TNNGplP1vWBOKyzPbY3Xd89MD3195SqISKq31B1Vn7FgKl6J3kD3mooPt-VilC3yWSjrpWxbc4ussyQVuPM6sQCAoZm4N_Xwi65cZ4qWRsFHPLnGo3ezvN8oWWBSwJVK7nDFRZz79LV4Qj87welXBD8j29nGXLZEgZapDWeJh1DuZ7BtNLzJKMnbfAQBZHv08cfAA1SvSend38D3lZTC--KwhpNJnDPBO8g_9mcdEZlPbDUCQOYTkngulwvzQw-5cDn8hNhsxzEn8IH7uS_fUAd41LqVU0fOx5U96qV6dJergBlpHtonlil9dXGRESqPVnLyljAraeAFZlVKvZ3rC0vxMkv2YqKL3wxCMasCBLfxBwZrPUBRwsdfAGcW4skexiIqhsLeTatTDYRhgXehYRwJxNGC-_pTuNthy7EIswQbMyOVCjpdtU48BvdfiPsm4q63_XNDz0ik8B9J49vtMQUMgyMxWN031TYbRs5iN9Jx0upsCn9p5ioHLsxU4v_bzB8Hpnf5wmoAIUVw_Sez9JJm08usKx9gw2u0CDyITh2hx2fBWO8TqU5br00jHUWTm7UvzZFYqX9LC14HAC-YGll8xO7Z_jZLDKeixc3NuekUPawfMoRkZqmTECxSg2l6YI3TexqxQmdJ7jg-UaRk0xaNI0eQTPMf-JLTDKN_ZCxqq2IhotAPVQTcFjn3lYBio7YUd7Veq9wR3DDwjewcqZ0H9fP07PuhciVrRDkbb6pt7U4DY-857hYGCDBYvwoKmqkWZyqfkd95E


 
Cc: xixuan@chromium.org nxia@chromium.org
Cc: dshi@chromium.org
Owner: kenobi@chromium.org
this is also found on pfq  crbug.com/787936 
Blocking: 788437
Lab will be closed next Monday, why don't we restart the server? Assign
this to deputy please.
This one looks quite similar as well: https://bugs.chromium.org/p/chromium/issues/detail?id=788595 .
Owner: xixuan@chromium.org
Cc: athilenius@chromium.org slavamn@chromium.org kenobi@chromium.org
 Issue 788595  has been merged into this issue.
Cc: davidri...@chromium.org ihf@chromium.org n...@chromiu.org achuith@chromium.org
 Issue 787936  has been merged into this issue.
server119.mtv is restarted.

chromeos-test@chromeos-server119:/usr/local/autotest$ python site_utils/lxc/lxc_functional_test.py -v -s
...
...
2017-11-27 10:16:29,039.039 INFO |lxc_functional_tes:0354|       MainThread(139654776600384)| All tests passed.

This lxc issue should be fixed. Wait for next round of reef-paladin to verify.
Cc: cernekee@chromium.org
 Issue 788437  has been merged into this issue.
Issue 788835 has been merged into this issue.
Status: Verified
Latest reef-paladin/release look good.
Sign in to add a comment