New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 852158 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

push to prod broken by LXC

Project Member Reported by cra...@chromium.org, Jun 12 2018

Issue description

Looks like push to prod is currently broken.  It's failing due to SERVER_JOBs failing.

More specifically, from chromeos-staging-master2:/var/log/test_push/test_push.log:


dummy_Pass.bluetooth          GOOD
gandof-release/R64-10176.65.0/push_to_prod/provision_AutoUpdate.double_SERVER_JOBFAIL
dummy_Fail.Fail               FAIL
dummy_Fail.Warn               WARN
gandof-release/R64-10176.65.0/push_to_prod/dummy_PassServer.ssp_SERVER_JOBFAIL
dummy_Pass                    GOOD
gandof-release/R64-10176.65.0/push_to_prod/autotest_SyncCount_SERVER_JOBFAIL
dummy_Fail.Crash              GOOD
dummy_Fail.dependency         TEST_NA
dummy_Fail.Error              ERROR
dummy_Fail.RetryFail          FAIL
dummy_Fail.NAError            TEST_NA
dummy_Pass.actionable         GOOD
login_LoginSuccess            GOOD
dummy_Fail.RetrySuccess       GOOD
SERVER_JOB                    GOOD


Compare this to autotest's test_push.py's EXPECTED_TEST_RESULTS:
EXPECTED_TEST_RESULTS = {'^SERVER_JOB$':                 'GOOD',
                         # This is related to dummy_Fail/control.dependency.
                         'dummy_Fail.dependency$':       'TEST_NA',
                         'login_LoginSuccess.*':         'GOOD',
                         'provision_AutoUpdate.double':  'GOOD',
                         'dummy_Pass.*':                 'GOOD',
                         'dummy_Fail.Fail$':             'FAIL',
                         'dummy_Fail.RetryFail$':        'FAIL',
                         'dummy_Fail.RetrySuccess':      'GOOD',
                         'dummy_Fail.Error$':            'ERROR',
                         'dummy_Fail.Warn$':             'WARN',
                         'dummy_Fail.NAError$':          'TEST_NA',
                         'dummy_Fail.Crash$':            'GOOD',
                         'autotest_SyncCount$':          'GOOD',
                         }

Digging deeper, the logs can be retrieved here:
http://chromeos-staging-master2.hot/tko/retrieve_logs.cgi?job=/results/19041-chromeos-test/
http://chromeos-staging-master2.hot/tko/retrieve_logs.cgi?job=/results/19036-chromeos-test/

which shows:

06/12 13:29:53.860 INFO |          autoserv:0734| Results placed in /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5
06/12 13:29:53.864 INFO |           pidfile:0016| Logged pid 145284 to /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5/.autoserv_execute
06/12 13:29:53.990 INFO |        server_job:1519| Shadowing AFE store with a FileStore at /usr/local/autotest/results/19041-chromeos-test/chromeos2-row1-rack2-host5/host_info_store/dir_385e7b6f-d0c8-4cbc-86
06/12 13:29:54.021 INFO |    connectionpool:0207| Starting new HTTP connection (1): metadata.google.internal
06/12 13:29:54.138 NOTIC|      cros_logging:0038| ts_mon was set up.
06/12 13:30:08.999 INFO |        server_job:0218| FAIL  ----    ----    timestamp=1528835408    localtime=Jun 12 13:30:08       Failed to setup container for test: 'NoneType' object has no attribute '__geti
06/12 13:30:09.640 ERROR|         traceback:0013| Traceback (most recent call last):
06/12 13:30:09.641 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 578, in run_autoserv
06/12 13:30:09.641 ERROR|         traceback:0013|     machines)
06/12 13:30:09.641 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 175, in _run_with_ssp
06/12 13:30:09.641 ERROR|         traceback:0013|     dut_name=dut_name)
06/12 13:30:09.641 ERROR|         traceback:0013|   File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 493, in wrapper
06/12 13:30:09.642 ERROR|         traceback:0013|     return fn(*args, **kwargs)
06/12 13:30:09.642 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail
06/12 13:30:09.643 ERROR|         traceback:0013|     return func(*args, **kwargs)
06/12 13:30:09.643 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/container_bucket.py", line 207, in setup_test
06/12 13:30:09.643 ERROR|         traceback:0013|     deploy_config_manager = lxc_config.DeployConfigManager(container)
06/12 13:30:09.644 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/config.py", line 231, in __init__
06/12 13:30:09.644 ERROR|         traceback:0013|     tmp_append = os.path.join(self.container.rootfs,
06/12 13:30:09.644 ERROR|         traceback:0013|   File "/usr/local/autotest/site_utils/lxc/container.py", line 300, in rootfs
06/12 13:30:09.645 ERROR|         traceback:0013|     if self._LXC_VERSION[0] >= 3:
06/12 13:30:09.645 ERROR|         traceback:0013| TypeError: 'NoneType' object has no attribute '__getitem__'
06/12 13:30:09.657 INFO |            client:0570| Attempting refresh to obtain initial access_token
06/12 13:30:09.718 INFO |            client:0872| Refreshing access_token
06/12 13:30:09.863 ERROR|          autoserv:0809| Uncaught SystemExit with code 1
Traceback (most recent call last):
  File "/usr/local/autotest/server/autoserv", line 805, in main
    use_ssp)
  File "/usr/local/autotest/server/autoserv", line 627, in run_autoserv
    sys.exit(exit_code)
SystemExit: 1





 

Comment 1 by cra...@chromium.org, Jun 12 2018

This change looks suspicious:

commit 8c182fff168e3903bc13a8613a361a0431e80e40
Author: Keith Haddow <haddowk@chromium.org>
Date:   Thu May 17 10:43:24 2018 -0700

    [autotest] Change to make lxc-info call work with lxc 3.0.0
    
    There is a change to the config syntax between lxc 2.x.x and
    lxc 3.x.x, determine the version of lxc we are running and
    make the correct call.
    
    I did it this way rather than just catch the error as I feel
    that version differences are going to continue to happen and this
    should make it easier to make future changes.
    
    BUG= chromium:844050 
    TEST=tested on moblab running lxc.3.x and tryjobs
    
    Change-Id: Ib605a78858c13e464a92dcbbb63668a3b5307f54
    Reviewed-on: https://chromium-review.googlesource.com/1064690
    Commit-Ready: Keith Haddow <haddowk@chromium.org>
    Tested-by: Keith Haddow <haddowk@chromium.org>
    Reviewed-by: Jacob Kopczynski <jkop@chromium.org>


Comment 2 by jkop@chromium.org, Jun 12 2018

Owner: jkop@chromium.org
Status: Started (was: Untriaged)
Reverted and chumped.

Comment 3 Deleted

Comment 4 by cra...@chromium.org, Jun 12 2018

Looks like some of the servers don't have lxc-info on them?  Is this intentional or is this the real bug?

chromeos-test@chromeos-gt-devserver17:~$ lxc-info --version
E: command-not-found is currently not supported on gLinux. See <http://b/16150412> for more information.
lxc-info: command not found

Comment 5 by jkop@chromium.org, Jun 12 2018

Neither, I believe. If I remember correctly, it's a change in how lxc info is called from version to version.
jkop@ can you explain a bit more ?

chromeos-test@chromeos-gt-devserver17:~$ lxc-info --version
E: command-not-found is currently not supported on gLinux. See <http://b/16150412> for more information.
lxc-info: command not found

Seems to suggest the executable is not installed

Comment 7 by jkop@chromium.org, Jun 12 2018

That's a devserver, why would it have LXC in the first place?
Cc: cra...@chromium.org dshi@chromium.org stephenlin@chromium.org
Sorry I dont know how the lab is setup, just moblab and everything is on one box.

Perhaps you can suggest the corrective action, if we need to determine the correct version of LXC when the rootfs property is being accessed how should it be done ?

Comment 9 by jkop@chromium.org, Jun 12 2018

I believe what happened here was that Container.create_from_existing_dir was called before Container.__init__ had ever been called.

Or, actually: Container.__init__ does not set the class-level property, only the object-level one. So that value will always be unset when create_from_existing_dir is called.
I am not sure I understand __init__ will always be called when the cls() is called, I think it should be good in the case of Container.create_from_existing_dir

This code will certainly not work however if the machine does not have lxc installed

class A(object):

  def __init__(self):
    print "Init Called"

  @classmethod
  def create(cls):
    print "Create Called"
    return cls()


print "Test 1"
a = A()

print "Test 2"
b = A.create()

python test.py
Test 1
Init Called
Test 2
Create Called
Init Called


Comment 11 by jkop@chromium.org, Jun 13 2018

That code works, because it's different from the real code. Here's a snippet that shows the difference:

class A(object):

  def __init__(self):
    print "Init Called"

  @classmethod
  def create(cls):
    print "Create Called"
    return cls()


print "Test 1"

print "Test 2"
b = A.create()

python test.py
Test 1
Test 2
Create Called
Init Called

And if you change A.create to match it more fully, changing it to:

  @classmethod
  def create(cls):
    print "Create Called"
    return True

then the output you get is:

python test.py 
Test 1
Test 2
Create Called
Cc: haddowk@chromium.org
Sorry if I am just confused, can you point to the code you are seeing, I was trying to replicate this 

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/site_utils/lxc/container.py?rcl=6ac1344dba6c0395a8725033521e4dd0d192f0be&l=193

Comment 14 by jkop@chromium.org, Jun 13 2018

Summary: push to prod broken by LXC (was: push to prod currently broken.)
That's not the method that's broken. That's not even being called.

Comment 15 by jkop@chromium.org, Jun 13 2018

@craigb, which log contains the excerpt you noted? Which machine was executing it? There's a whole lot of words there but the important context is missing.
Ok I misunderstand your comments in #9 then, can you explain them some more.

Comment 17 by jkop@chromium.org, Jun 13 2018

Status: Fixed (was: Started)
Push to prod succeeded

Sign in to add a comment