Containers cannot start on Rodete |
|||||||||
Issue descriptionOn ToT (git SHA e38310559ebc0c5695c27ab4492683977d461fba), on my workstation, lxc_functional_test reliably fails. Here is the most relevant chunk of error log: jkop@jkop:~/chromiumos/src/third_party/autotest/files/site_utils/lxc$ ./lxc_functional_test.py DEBUG:root:Running 'sudo -n true' 2018-03-14 18:44:39,214.214 INFO |lxc_functional_tes:0184| MainThread(140254145861376)| Rebuild base container in folder /usr/local/autotest/containers/container_test_HfP4DQ. 2018-03-14 18:44:49,766.766 INFO |lxc_functional_tes:0187| MainThread(140254145861376)| Base container created: base 2018-03-14 18:44:49,946.946 INFO |lxc_functional_tes:0200| MainThread(140254145861376)| Create test container. 2018-03-14 18:44:51,450.450 ERROR|lxc_functional_tes:0363| MainThread(140254145861376)| ERROR: Traceback (most recent call last): File "./lxc_functional_test.py", line 359, in <module> main(options) File "./lxc_functional_test.py", line 344, in main container = setup_test(bucket, container_id, options.skip_cleanup) File "./lxc_functional_test.py", line 206, in setup_test dut_name='192.168.0.3') File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site-packages/chromite/lib/metrics.py", line 483, in wrapper return fn(*args, **kwargs) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail return func(*args, **kwargs) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_bucket.py", line 202, in setup_test container = self._factory.create_container(container_id) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_factory.py", line 71, in create_container new_container = self._create_from_base(name, lxc_path) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site-packages/chromite/lib/metrics.py", line 483, in wrapper return fn(*args, **kwargs) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_factory.py", line 116, in _create_from_base cleanup=self._force_cleanup) File "/usr/local/google/home/jkop/chromiumos/src/third_party/autotest/files/site_utils/lxc/container.py", line 227, in clone new_name) ContainerError: Container test_123_1521078289_77533 already exists. (complete log here: gpaste/5686101524086784) Other LXC tests also fail, but this one is significantly less likely to be a result of bad config on my workstation.
,
Mar 15 2018
how do I run lxc_functional_test.py ? do I need to run any command to create the container?
nxia@nxia:~/chromiumos/src/third_party/autotest/files/site_utils/lxc$ ./lxc_functional_test.py
Traceback (most recent call last):
File "./lxc_functional_test.py", line 33, in <module>
prefix='container_test_')
File "/usr/lib/python2.7/tempfile.py", line 339, in mkdtemp
_os.mkdir(file, 0700)
OSError: [Errno 2] No such file or directory: '/usr/local/autotest/containers/container_test_DfCiFI'
,
Mar 15 2018
Hmm, that's a plausible cause for the issue. I just created the directory with mkdir, I think, and had forgotten.
,
Mar 15 2018
do I need to test anything else?
,
Mar 15 2018
Try creating the dir with mkdir and running it then. If that passes, I'll investigate and eventually get back to you; if it fails with a similar log to mine, then no need for anything else, just mark it confirmed.
,
Mar 15 2018
Hit the same error
2018-03-15 12:10:49,754.754 INFO |lxc_functional_tes:0367| MainThread(140297052468992)| Cleaning up temporary directory /usr/local/autotest/containers/container_test_MThD75.
Traceback (most recent call last):
File "./lxc_functional_test.py", line 359, in <module>
main(options)
File "./lxc_functional_test.py", line 344, in main
container = setup_test(bucket, container_id, options.skip_cleanup)
File "./lxc_functional_test.py", line 206, in setup_test
dut_name='192.168.0.3')
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site-packages/chromite/lib/metrics.py", line 483, in wrapper
return fn(*args, **kwargs)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail
return func(*args, **kwargs)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_bucket.py", line 202, in setup_test
container = self._factory.create_container(container_id)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_factory.py", line 71, in create_container
new_container = self._create_from_base(name, lxc_path)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site-packages/chromite/lib/metrics.py", line 483, in wrapper
return fn(*args, **kwargs)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site_utils/lxc/container_factory.py", line 116, in _create_from_base
cleanup=self._force_cleanup)
File "/usr/local/google/home/nxia/chromiumos/src/third_party/autotest/files/site_utils/lxc/container.py", line 227, in clone
new_name)
autotest_lib.client.common_lib.error.ContainerError: Container test_123_1521141048_155403 already exists.
,
Mar 15 2018
,
Mar 15 2018
,
Mar 15 2018
This is a result of lxc-clone not being available on my workstation, and could be fixed by migrating to lxc-copy.
,
Mar 16 2018
That was insufficient to fix the problem. Starting the container still fails, stopped deliberately by AppArmor. gpaste/6488436445806592
,
Mar 16 2018
Got a more detailed log and updated the paste: Here's the most relevant bit:
lxc-start 20180316002846.452 INFO lxc_cgfsng - cgroups/cgfsng.c:cgfsng_setup_limits:1991 - cgroup has been setup
lxc-start 20180316002846.452 INFO lxc_start - start.c:do_start:836 - Unshared CLONE_NEWCGROUP.
lxc-start 20180316002846.452 WARN lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:218 - Incomplete AppArmor support in your kernel
lxc-start 20180316002846.452 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:220 - If you really want to start this container, set
lxc-start 20180316002846.452 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - lxc.aa_allow_incomplete = 1
lxc-start 20180316002846.452 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:222 - in your container configuration file
lxc-start 20180316002846.452 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
lxc-start 20180316002846.453 ERROR lxc_start - start.c:__lxc_start:1354 - Failed to spawn container "test_123_1521150125_207921".
lxc-start 20180316002846.453 WARN lxc_monitor - monitor.c:lxc_monitor_fifo_send:111 - Failed to open fifo to send message: No such file or directory.
lxc-start 20180316002846.453 WARN lxc_monitor - monitor.c:lxc_monitor_fifo_send:111 - Failed to open fifo to send message: No such file or directory.
lxc-start 20180316002846.453 INFO lxc_conf - conf.c:run_script_argv:427 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "test_123_1521150125_207921", config section "lxc".
lxc-start 20180316002846.963 ERROR lxc_start_ui - tools/lxc_start.c:main:366 - The container failed to start.
lxc-start 20180316002846.963 ERROR lxc_start_ui - tools/lxc_start.c:main:370 - Additional information can be obtained by setting the --logfile and --logpriority options.
,
Mar 16 2018
The command I ran to get that log was `sudo lxc-start -P /usr/local/autotest/containers/container_test_L2LhnJ -n test_123_1521150125_207921 -dFo ~/container_log_L2LhnJ.log --logpriority=DEBUG`, results are in gpaste/6488436445806592
,
Mar 16 2018
This works on a drone with the same code that fails on my workstation.
,
Mar 16 2018
This is at least partially a Rodete problem, but doesn't seem to be entirely a Rodete problem.
,
Mar 16 2018
I'm not sure if this applies to all containers or just the LXC pool zygotes. But I have determined what is blocking the tests. When running lxc-start, precise command and log as mentioned in #12, it starting is blocked by "AppArmor", an Access Control framework used in Ubuntu and Debian. This does not appear to exist on servers but is present on workstations.
,
Mar 16 2018
Might be related to the work in b/71629580?
,
Mar 19 2018
Unrelated to the work in b/71629580. Rodete includes a kernel with Apparmor support, and lxc has support for using Apparmor to restrict what's happening within a container. Unfortunately the lxc support requires additional patches that are carried in the Ubuntu kernel and aren't in the mainline kernel. I think lxc's behaviour here is a bug - it shouldn't depend on features that aren't mainline, and should degrade gracefully instead. You can work around this by setting the lxc.aa_allow_incomplete = 1 option as indicated in the error message.
,
Mar 19 2018
,
Mar 19 2018
Re #17: That 'workaround' was already in place when I got the error messages above. It has no effect.
,
Mar 19 2018
Well, in that case it seems like a bug in lxc - opening a bug against that in rodete seems reasonable.
,
Mar 19 2018
Filed; Rodete bug submission rules say to go through Techstop first, so I've done that.
,
Mar 31 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/bc9842db7e514a9c7c624c139cc88dd92f6b5a30 commit bc9842db7e514a9c7c624c139cc88dd92f6b5a30 Author: Jacob Kopczynski <jkop@google.com> Date: Sat Mar 31 04:53:35 2018 autotest: lxc-functional-test creates directory The functional tests created temp containers within a directory but did not check whether that directory existed. Now it does, and creates it. BUG=chromium:822112 TEST=removed directory and ran test Change-Id: I5da9376b4c0a8b581ad318819355b05427b5fcd3 Reviewed-on: https://chromium-review.googlesource.com/964930 Commit-Ready: Jacob Kopczynski <jkop@chromium.org> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Jacob Kopczynski <jkop@chromium.org> [modify] https://crrev.com/bc9842db7e514a9c7c624c139cc88dd92f6b5a30/site_utils/lxc/lxc_functional_test.py
,
Apr 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c54fecdc26aac0d9c3d08dcae6e9cfa7660adecd commit c54fecdc26aac0d9c3d08dcae6e9cfa7660adecd Author: Jacob Kopczynski <jkop@google.com> Date: Sat Apr 28 04:27:38 2018 autotest: lxc: use lxc-copy where available On shards, we have lxc-copy, which is preferred. On moblab, we still use an old version of lxc, which does not have lxc-copy and still uses the deprecated lxc-clone. This checks for the existence of lxc-copy, uses it if able, and falls back to the old command if necessary. BUG=chromium:822112 TEST=Hwtest tryjob on moblab Change-Id: Ie0dbe9048ef503db371a3f74c42b81ec6c6767b0 Reviewed-on: https://chromium-review.googlesource.com/965398 Commit-Ready: Jacob Kopczynski <jkop@chromium.org> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Laurence Goodby <lgoodby@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/c54fecdc26aac0d9c3d08dcae6e9cfa7660adecd/site_utils/lxc/utils.py
,
Jun 7 2018
,
Jun 7 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by jkop@chromium.org
, Mar 15 2018