push to prod broken by LXC |
||||||
Issue description
Looks like push to prod is currently broken. It's failing due to SERVER_JOBs failing.
More specifically, from chromeos-staging-master2:/var/log/test_push/test_push.log:
dummy_Pass.bluetooth GOOD
gandof-release/R64-10176.65.0/push_to_prod/provision_AutoUpdate.double_SERVER_JOBFAIL
dummy_Fail.Fail FAIL
dummy_Fail.Warn WARN
gandof-release/R64-10176.65.0/push_to_prod/dummy_PassServer.ssp_SERVER_JOBFAIL
dummy_Pass GOOD
gandof-release/R64-10176.65.0/push_to_prod/autotest_SyncCount_SERVER_JOBFAIL
dummy_Fail.Crash GOOD
dummy_Fail.dependency TEST_NA
dummy_Fail.Error ERROR
dummy_Fail.RetryFail FAIL
dummy_Fail.NAError TEST_NA
dummy_Pass.actionable GOOD
login_LoginSuccess GOOD
dummy_Fail.RetrySuccess GOOD
SERVER_JOB GOOD
Compare this to autotest's test_push.py's EXPECTED_TEST_RESULTS:
EXPECTED_TEST_RESULTS = {'^SERVER_JOB$': 'GOOD',
# This is related to dummy_Fail/control.dependency.
'dummy_Fail.dependency$': 'TEST_NA',
'login_LoginSuccess.*': 'GOOD',
'provision_AutoUpdate.double': 'GOOD',
'dummy_Pass.*': 'GOOD',
'dummy_Fail.Fail$': 'FAIL',
'dummy_Fail.RetryFail$': 'FAIL',
'dummy_Fail.RetrySuccess': 'GOOD',
'dummy_Fail.Error$': 'ERROR',
'dummy_Fail.Warn$': 'WARN',
'dummy_Fail.NAError$': 'TEST_NA',
'dummy_Fail.Crash$': 'GOOD',
'autotest_SyncCount$': 'GOOD',
}
Digging deeper, the logs can be retrieved here:
http://chromeos-staging-master2.hot/tko/retrieve_logs.cgi?job=/results/19041-chromeos-test/
http://chromeos-staging-master2.hot/tko/retrieve_logs.cgi?job=/results/19036-chromeos-test/
which shows:
06/12 13:29:53.860 INFO | autoserv:0734| Results placed in /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5
06/12 13:29:53.864 INFO | pidfile:0016| Logged pid 145284 to /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5/.autoserv_execute
06/12 13:29:53.990 INFO | server_job:1519| Shadowing AFE store with a FileStore at /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5/host_info_store/dir_385e7b6f-d0c8-4cbc-86
06/12 13:29:54.021 INFO | connectionpool:0207| Starting new HTTP connection (1): metadata.google.internal
06/12 13:29:54.138 NOTIC| cros_logging:0038| ts_mon was set up.
06/12 13:30:08.999 INFO | server_job:0218| FAIL ---- ---- timestamp=1528835408 localtime=Jun 12 13:30:08 Failed to setup container for test: 'NoneType' object has no attribute '__geti
06/12 13:30:09.640 ERROR| traceback:0013| Traceback (most recent call last):
06/12 13:30:09.641 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 578, in run_autoserv
06/12 13:30:09.641 ERROR| traceback:0013| machines)
06/12 13:30:09.641 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 175, in _run_with_ssp
06/12 13:30:09.641 ERROR| traceback:0013| dut_name=dut_name)
06/12 13:30:09.641 ERROR| traceback:0013| File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 493, in wrapper
06/12 13:30:09.642 ERROR| traceback:0013| return fn(*args, **kwargs)
06/12 13:30:09.642 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail
06/12 13:30:09.643 ERROR| traceback:0013| return func(*args, **kwargs)
06/12 13:30:09.643 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container_bucket.py", line 207, in setup_test
06/12 13:30:09.643 ERROR| traceback:0013| deploy_config_manager = lxc_config.DeployConfigManager(container)
06/12 13:30:09.644 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/config.py", line 231, in __init__
06/12 13:30:09.644 ERROR| traceback:0013| tmp_append = os.path.join(self.container.rootfs,
06/12 13:30:09.644 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container.py", line 300, in rootfs
06/12 13:30:09.645 ERROR| traceback:0013| if self._LXC_VERSION[0] >= 3:
06/12 13:30:09.645 ERROR| traceback:0013| TypeError: 'NoneType' object has no attribute '__getitem__'
06/12 13:30:09.657 INFO | client:0570| Attempting refresh to obtain initial access_token
06/12 13:30:09.718 INFO | client:0872| Refreshing access_token
06/12 13:30:09.863 ERROR| autoserv:0809| Uncaught SystemExit with code 1
Traceback (most recent call last):
File "/usr/local/autotest/server/autoserv", line 805, in main
use_ssp)
File "/usr/local/autotest/server/autoserv", line 627, in run_autoserv
sys.exit(exit_code)
SystemExit: 1
,
Jun 12 2018
Reverted and chumped.
,
Jun 12 2018
Looks like some of the servers don't have lxc-info on them? Is this intentional or is this the real bug? chromeos-test@chromeos-gt-devserver17:~$ lxc-info --version E: command-not-found is currently not supported on gLinux. See <http://b/16150412> for more information. lxc-info: command not found
,
Jun 12 2018
Neither, I believe. If I remember correctly, it's a change in how lxc info is called from version to version.
,
Jun 12 2018
jkop@ can you explain a bit more ? chromeos-test@chromeos-gt-devserver17:~$ lxc-info --version E: command-not-found is currently not supported on gLinux. See <http://b/16150412> for more information. lxc-info: command not found Seems to suggest the executable is not installed
,
Jun 12 2018
That's a devserver, why would it have LXC in the first place?
,
Jun 12 2018
Sorry I dont know how the lab is setup, just moblab and everything is on one box. Perhaps you can suggest the corrective action, if we need to determine the correct version of LXC when the rootfs property is being accessed how should it be done ?
,
Jun 12 2018
I believe what happened here was that Container.create_from_existing_dir was called before Container.__init__ had ever been called. Or, actually: Container.__init__ does not set the class-level property, only the object-level one. So that value will always be unset when create_from_existing_dir is called.
,
Jun 13 2018
I am not sure I understand __init__ will always be called when the cls() is called, I think it should be good in the case of Container.create_from_existing_dir
This code will certainly not work however if the machine does not have lxc installed
class A(object):
def __init__(self):
print "Init Called"
@classmethod
def create(cls):
print "Create Called"
return cls()
print "Test 1"
a = A()
print "Test 2"
b = A.create()
python test.py
Test 1
Init Called
Test 2
Create Called
Init Called
,
Jun 13 2018
That code works, because it's different from the real code. Here's a snippet that shows the difference:
class A(object):
def __init__(self):
print "Init Called"
@classmethod
def create(cls):
print "Create Called"
return cls()
print "Test 1"
print "Test 2"
b = A.create()
python test.py
Test 1
Test 2
Create Called
Init Called
And if you change A.create to match it more fully, changing it to:
@classmethod
def create(cls):
print "Create Called"
return True
then the output you get is:
python test.py
Test 1
Test 2
Create Called
,
Jun 13 2018
,
Jun 13 2018
Sorry if I am just confused, can you point to the code you are seeing, I was trying to replicate this https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/site_utils/lxc/container.py?rcl=6ac1344dba6c0395a8725033521e4dd0d192f0be&l=193
,
Jun 13 2018
That's not the method that's broken. That's not even being called.
,
Jun 13 2018
@craigb, which log contains the excerpt you noted? Which machine was executing it? There's a whole lot of words there but the important context is missing.
,
Jun 13 2018
Ok I misunderstand your comments in #9 then, can you explain them some more.
,
Jun 13 2018
Push to prod succeeded |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by cra...@chromium.org
, Jun 12 2018