Fix moblab to work with the version of lxc provided by portage-stable |
||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/7989 11/27 21:27:52.766 DEBUG| base_job:0357| Persistent state global_properties.fast now set to False 11/27 21:27:52.766 DEBUG| base_job:0357| Persistent state global_properties.max_result_size_KB now set to 20000 11/27 21:27:52.785 DEBUG| autotemp:0116| Clean was not called for /tmp/_autotmp_aeCTpYssh-master 11/27 21:27:52.809 INFO | connectionpool:0207| Starting new HTTP connection (1): metadata.google.internal 11/27 21:27:53.064 INFO | config:0024| Configuration file does not exist, ignoring: /etc/chrome-infra/ts-mon.json 11/27 21:27:53.065 ERROR| config:0244| ts_mon monitoring is disabled because the endpoint provided is invalid or not supported: 11/27 21:27:53.066 NOTIC| cros_logging:0038| ts_mon was set up. 11/27 21:27:53.066 DEBUG| autoserv:0264| Trying to start servod. 11/27 21:27:53.166 WARNI| autoserv:0272| Starting servod is aborted. The dut's servo_host attribute is not set to localhost. 11/27 21:27:53.166 DEBUG| utils:0212| Running 'sudo test -e "/mnt/moblab/containers/base_05/container_id.p"' 11/27 21:27:53.180 DEBUG| utils:0212| Running 'sudo lxc-ls --active' 11/27 21:27:53.197 DEBUG| utils:0212| Running 'sudo test -e "/mnt/moblab/containers/base_05/rootfs"' 11/27 21:27:53.212 DEBUG| utils:0212| Running 'cp /usr/local/autotest/results/drone_tmp/attach.7 /usr/local/autotest/results/2-moblab/192.168.231.101/attach.7' 11/27 21:27:53.220 DEBUG| utils:0212| Running 'sudo test -e "/mnt/moblab/containers/test_2_1511846872_15065"' 11/27 21:27:53.242 DEBUG| utils:0212| Running 'sudo -n virt-what' 11/27 21:27:53.259 WARNI| utils:2300| Package virt-what is not installed, default to assume it is not a virtual machine. 11/27 21:27:53.260 DEBUG| utils:0212| Running 'sudo lxc-clone --lxcpath /mnt/moblab/containers --newpath /mnt/moblab/containers --orig base_05 --new test_2_1511846872_15065 ' 11/27 21:27:53.276 DEBUG| container_factory:0102| Creating snapshot clone failed. Attempting without snapshot... 11/27 21:27:53.278 DEBUG| utils:0212| Running 'sudo lxc-ls --active' 11/27 21:27:53.306 DEBUG| utils:0212| Running 'sudo test -e "/mnt/moblab/containers/base_05/rootfs"' 11/27 21:27:53.326 DEBUG| utils:0212| Running 'sudo test -e "/mnt/moblab/containers/base_05/container_id.p"' 11/27 21:27:53.342 INFO | server_job:0218| FAIL ---- ---- timestamp=1511846873 localtime=Nov 27 21:27:53 Failed to setup container for test: Command <sudo lxc-clone --lxcpath /mnt/moblab/containers --newpath /mnt/moblab/containers --orig base_05 --new test_2_1511846872_15065 > failed, rc=1, Command returned non-zero exit status * Command: sudo lxc-clone --lxcpath /mnt/moblab/containers --newpath /mnt/moblab/containers --orig base_05 --new test_2_1511846872_15065 Exit status: 1 Duration: 0.00917220115662 stderr: sudo: lxc-clone: command not found. Check logs in ssp_logs folder for more details. 11/27 21:27:53.343 DEBUG| utils:0212| Running 'sudo -n chown -R 246 "/usr/local/autotest/results/2-moblab/192.168.231.101"' 11/27 21:27:53.353 DEBUG| utils:0212| Running 'sudo -n chgrp -R 246 "/usr/local/autotest/results/2-moblab/192.168.231.101"' 11/27 21:27:53.362 ERROR| traceback:0013| Traceback (most recent call last): 11/27 21:27:53.362 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 507, in run_autoserv 11/27 21:27:53.363 ERROR| traceback:0013| machines) 11/27 21:27:53.363 ERROR| traceback:0013| File "/usr/local/autotest/server/autoserv", line 168, in _run_with_ssp 11/27 21:27:53.363 ERROR| traceback:0013| dut_name=dut_name) 11/27 21:27:53.363 ERROR| traceback:0013| File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 483, in wrapper 11/27 21:27:53.364 ERROR| traceback:0013| return fn(*args, **kwargs) 11/27 21:27:53.364 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/cleanup_if_fail.py", line 40, in func_cleanup_if_fail 11/27 21:27:53.364 ERROR| traceback:0013| return func(*args, **kwargs) 11/27 21:27:53.364 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container_bucket.py", line 153, in setup_test 11/27 21:27:53.364 ERROR| traceback:0013| self.container_path) 11/27 21:27:53.365 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container_factory.py", line 67, in create_container 11/27 21:27:53.365 ERROR| traceback:0013| lxc_path=lxc_path) 11/27 21:27:53.365 ERROR| traceback:0013| File "/usr/lib64/python2.7/site-packages/chromite/lib/metrics.py", line 483, in wrapper 11/27 21:27:53.366 ERROR| traceback:0013| return fn(*args, **kwargs) 11/27 21:27:53.366 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container_factory.py", line 100, in _create_from_base 11/27 21:27:53.366 ERROR| traceback:0013| cleanup=self._force_cleanup) 11/27 21:27:53.366 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container.py", line 223, in clone 11/27 21:27:53.367 ERROR| traceback:0013| new_container = cls(new_path, new_name, {}, src, snapshot) 11/27 21:27:53.367 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/container.py", line 135, in __init__ 11/27 21:27:53.367 ERROR| traceback:0013| self.name, snapshot) 11/27 21:27:53.367 ERROR| traceback:0013| File "/usr/local/autotest/site_utils/lxc/utils.py", line 88, in 11/27 21:27:53.368 ERROR| traceback:0013| utils.run(cmd) 11/27 21:27:53.368 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/utils.py", line 738, in run 11/27 21:27:53.369 ERROR| traceback:0013| "Command returned non-zero exit status") 11/27 21:27:53.369 ERROR| traceback:0013| CmdError: Command <sudo lxc-clone --lxcpath /mnt/moblab/containers --newpath /mnt/moblab/containers --orig base_05 --new test_2_1511846872_15065 > failed, rc=1, Command returned non-zero exit status 11/27 21:27:53.369 ERROR| traceback:0013| * Command: 11/27 21:27:53.370 ERROR| traceback:0013| sudo lxc-clone --lxcpath /mnt/moblab/containers --newpath 11/27 21:27:53.370 ERROR| traceback:0013| /mnt/moblab/containers --orig base_05 --new test_2_1511846872_15065 11/27 21:27:53.370 ERROR| traceback:0013| Exit status: 1 11/27 21:27:53.370 ERROR| traceback:0013| Duration: 0.00917220115662 11/27 21:27:53.371 ERROR| traceback:0013| 11/27 21:27:53.371 ERROR| traceback:0013| stderr: 11/27 21:27:53.371 ERROR| traceback:0013| sudo: lxc-clone: command not found 11/27 21:27:53.378 ERROR| autoserv:0759| Uncaught SystemExit with code 1 Traceback (most recent call last): File "/usr/local/autotest/server/autoserv", line 755, in main use_ssp) File "/usr/local/autotest/server/autoserv", line 562, in run_autoserv sys.exit(exit_code) SystemExit: 1 11/27 21:27:53.434 DEBUG| logging_manager:0627| Logging subprocess finished 11/27 21:27:53.434 DEBUG| logging_manager:0627| Logging subprocess finishedclone Suspecting there's a bad CL.
,
Nov 28 2017
Can't find related CL except for this one: https://chromium-review.googlesource.com/c/chromiumos/overlays/portage-stable/+/784271 @dshi could you verify it's because of bad CL or guado_moblab flake?
,
Nov 28 2017
Could be, lxc-clone is an old script, replaced by lxc-copy in lxd. The lxc upgrade might remove that command completely. Lab is still on lxc 2, we need to do some test to see if lxc-copy works on lab server as well. For moblab, it's possible we can replace lxc-clone with lxc-copy if autotest finds it's running in moblab. +haddowk
,
Nov 28 2017
Assign to CL's owner.
,
Nov 29 2017
Where do I find the logs from comment #1? I uploaded a CL to replace lxc-clone with lxc-copy: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/794876 The guado_moblab-paladin-tryjob with that CL failed: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/paladin/builds/4559 But I can't find any logs that mention anything about lxc-clone or lxc-copy like in comment #1. The best I've been able to find is: Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos2-row2-rack8-host11 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|run_as_moblab|run] -> ssh_run(su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R62-9901.66.0 --suite_name=dummy_server --retry=True --max_retries=1')\";fi; su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R62-9901.66.0 --suite_name=dummy_server --retry=True --max_retries=1'" Exit status: 1 Duration: 489.806571007 Which looks to me like the ssh command failed but doesn't say anything about why the underlying call to run_suite.py failed. What's the magic location for the log from comment #1?
,
Nov 29 2017
https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/7989 => [Test-Logs]: moblab_RunSuite: FAIL: Unhandled AutoservRunError: command execution error => https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/159006017-chromeos-test/chromeos2-row1-rack8-host1/ => download moblab_RunSuite.tgz, extract it => moblab_RunSuite/sysinfo/reboot_current/mnt/moblab/results/4-moblab/192.168.231.101/ssp_logs/debug/autoserv.DEBUG
,
Dec 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d commit 7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d Author: Chirantan Ekbote <chirantan@chromium.org> Date: Sat Dec 02 06:45:28 2017 project-moblab: Copy app-emulation/lxc and mask newer versions Copy app-emulation/lxc from portage-stable into the project-moblab directory and mask newer versions in the moblab overlay because they break moblab. This allows us to update the version of lxc in portage-stable. BUG= chromium:789062 TEST='cros tryjob --hwtest guado_moblab-paladin-tryjob' Change-Id: I7cbf4dc445db9e7e3b38b11615b1d2bd8292094f Signed-off-by: Chirantan Ekbote <chirantan@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/804814 Reviewed-by: Mike Frysinger <vapier@chromium.org> [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/profiles/base/package.mask [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/files/lxc.initd.2 [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/files/lxc.initd.3 [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/metadata.xml [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/lxc-1.0.7.ebuild [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/files/lxc_at.service [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/profiles/base/eapi [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/files/lxc-1.0.6-bash-completion.patch [add] https://crrev.com/7aa8d5e7c02525f0544e9ed35e444f34ba0f2c9d/project-moblab/app-emulation/lxc/Manifest
,
Dec 4 2017
I've landed a temporary workaround to pin the version used by moblab to 1.0.7. Changing this bug to be about fixing moblab to work with the new version.
,
Mar 29 2018
Sameer, could you please find somebody to fix Chirantan's technical debt? This is breaking moblab and causing problems running server tests via lxc down the road.
,
Mar 29 2018
How exactly is this my technical debt?
,
Mar 29 2018
I believe you landed this TODO https://chromium-review.googlesource.com/#/c/chromiumos/overlays/board-overlays/+/804814/3/project-moblab/profiles/base/package.mask # Mask newer versions of lxc because they break moblab. # TODO( crbug.com/789062 ): Fix moblab to work with newer versions of lxc # and drop the old version in this overlay. This TODO has *your* name on it, even if you avoided typing it.
,
Mar 29 2018
crbug.com/789062 is quite a novel way to spell "chirantan".
,
Mar 29 2018
As someone with experience being an ass: Stop being an ass. Both of you. A quick fix for a bug you were owner of, which still needs to be fixed eventually, is technical debt someone in the moblab team needs to deal with.
,
Mar 29 2018
I'm not arguing that it's not technical debt. My point is that this is something that has always existed: * moblab depends on lxd * newer versions of lxd depend on criu * criu's build system is set up in a way that makes cross-compiling fail * upgrading lxd to a newer version requires fixing criu Even if I hadn't landed a workaround to unblock a separate project, _moblab would still have this exact problem_. The only difference is that I wouldn't be involved in the discussion in any way. I'm only objecting to the idea that it's somehow my fault that we're in this situation. I'm more than happy to help with getting criu fixed so that we drop this workaround. As for being an ass, I 100% agree that I was being an ass but I _really_ don't like people randomly CC'ing my manager and throwing me under the bus for stuff that is only tangentially related to me.
,
Mar 30 2018
My sincere apology to Chirantan for misreading the situation! I was following revision history and scanned this issue not carefully enough.
,
Mar 30 2018
That said lxc is kernel code and owned by the kernel team. I don't see the infra team resolving problems with it. lxc is required to run moblab, a board which is mission critical. It was agreed that moblab is used by CrOS partners to qualify builds. If moblab lxc cannot keep up with the lxc in the ChromeOS lab then moblab will sooner or later diverge and fail and partners will not deal with it anymore. This is why I feel so strongly about it. Chirantan, I am very grateful that you were able to see past my abrasiveness and offered your help with upreving lxc. I do understand though if Sameer should look for somebody else.
,
Mar 30 2018
Thanks all for bringing this discussion back on rails. Much appreciated. Chirantan will be chatting with Keith from the moblab team to figure out the next steps. Keith is out today so earliest this will happen is Monday. Assigning to Chirantan for now.
,
Apr 3 2018
,
May 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/feae14de476398a6695ca4e9f5c920a8127e19b3 commit feae14de476398a6695ca4e9f5c920a8127e19b3 Author: Keith Haddow <haddowk@chromium.org> Date: Sat May 12 06:22:07 2018 [moblab] Move moblab to lxc 2.1.1 - override cgroups prefix mount - remove the pin to prioir lxc version - Add use flags to support new lxc package and criu - Update base container init for new image format CQ-DEPEND=CL:1054342,CL:1038126,CL:1054409 TEST=Built and tested on moblab BUG= chromium:789062 Change-Id: I826b36230ae5e1d17c5442eb1d83746ebe96e9c0 Reviewed-on: https://chromium-review.googlesource.com/1054184 Commit-Ready: Keith Haddow <haddowk@chromium.org> Tested-by: Keith Haddow <haddowk@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/profiles/base/package.mask [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/files/lxc.initd.2 [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/files/lxc.initd.3 [modify] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/moblab-base-container-init.conf [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/metadata.xml [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/lxc-1.0.7.ebuild [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/files/lxc_at.service [modify] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/profiles/base/make.defaults [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/files/lxc-1.0.6-bash-completion.patch [delete] https://crrev.com/36514f44a8a10009a13c8dec22a31a3ea7ab0825/project-moblab/app-emulation/lxc/Manifest [modify] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/profiles/base/package.use [modify] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/chromeos-base/chromeos-bsp-moblab/chromeos-bsp-moblab-9999.ebuild [add] https://crrev.com/feae14de476398a6695ca4e9f5c920a8127e19b3/project-moblab/chromeos-base/chromeos-bsp-moblab/files/cgroups.override
,
May 12 2018
It is not portage-stable but chromiumos-overlay but the essence of the bug is fixed. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by xixuan@chromium.org
, Nov 28 2017