chromite unittest flake on TOT. |
|||||||||
Issue descriptionA PreCQ build This build: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8951227604493542672 Failed with this error: 14:51:55: INFO: RunCommand: cros_sdk -- /mnt/host/source/chromite/cbuildbot/run_tests run_tests: Unhandled exception: Traceback (most recent call last): File "/mnt/host/source/chromite/cbuildbot/run_tests", line 169, in <module> DoMain() File "/mnt/host/source/chromite/cbuildbot/run_tests", line 165, in DoMain commandline.ScriptWrapperMain(FindTarget) File "/mnt/host/source/chromite/lib/commandline.py", line 911, in ScriptWrapperMain ret = target(argv[1:]) File "/mnt/host/source/chromite/cbuildbot/run_tests.py", line 528, in main _ReExecuteIfNeeded([sys.argv[0]] + argv, opts.network) File "/mnt/host/source/chromite/cbuildbot/run_tests.py", line 474, in _ReExecuteIfNeeded cgroups.Cgroup.InitSystem() File "/mnt/host/source/chromite/lib/cros_build_lib.py", line 1926, in wrapper val = functor(obj) File "/mnt/host/source/chromite/lib/cgroups.py", line 147, in InitSystem _EnsureMounted(cls.CGROUP_ROOT, cgroup_root_args) File "/mnt/host/source/chromite/lib/cgroups.py", line 136, in _EnsureMounted osutils.SafeMakedirs(mnt, sudo=True) File "/mnt/host/source/chromite/lib/osutils.py", line 239, in SafeMakedirs os.makedirs(path, mode) File "/usr/lib64/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cros' [1;31m14:51:58: ERROR: Traceback (most recent call last): File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run self.PerformStage() File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/test_stages.py", line 469, in PerformStage cros_build_lib.RunCommand(cmd, enter_chroot=True) File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/cros_build_lib.py", line 658, in RunCommand raise RunCommandError(msg, cmd_result) RunCommandError: return code: 1; command: cros_sdk -- /mnt/host/source/chromite/cbuildbot/run_tests cmd=['cros_sdk', '--', '/mnt/host/source/chromite/cbuildbot/run_tests']
,
Mar 23 2018
Going to the cited build page, I see that there's a link
labeled "Bot hostname". The link is broken:
https://uberchromegw.corp.google.com/i//buildslaves/swarm-cros-33
,
Mar 23 2018
There is a feedback button on the top right of that page, please point this out. Here is the history of the builder in question, which is what I assume you wanted. https://chrome-swarming.appspot.com/bot?id=swarm-cros-32
,
Mar 24 2018
> Here is the history of the builder in question, which is what I assume you wanted. Possibly. My thinking was more "No such file or directory: '/sys/fs/cgroup/cros'" isn't an error that would occur only occasionally. I wanted to check whether the file could be confirmed as actually present on that bot, and also confirm that the file would still be present from inside the chroot.
,
Mar 24 2018
I think the cgroups are setup dynamically during chroot creation, so I'm not sure what that means. Possibly the chroot was somewhat corrupt?
,
Mar 24 2018
> I think the cgroups are setup dynamically during chroot creation, so I'm not sure what that means.
My reading of the test failure is that it was actually in the
process of creating the target location. That would mean we were
missing /sys/fs/cgroup. A simple check on my workstation suggests
that that directory gets created/mounted by cros_sdk (leastways,
that's the normal case):
$ cros_sdk -- ls -la /sys/fs/cgroup
total 0
dr-xr-xr-x 2 root root 0 Mar 6 16:07 .
drwxr-xr-x 9 root root 0 Mar 12 04:01 ..
,
Mar 24 2018
Also, there's this:
File "/mnt/host/source/chromite/lib/osutils.py", line 239, in SafeMakedirs
os.makedirs(path, mode)
File "/usr/lib64/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
`os.makedirs()` is supposed to be proof against petty problems like
ENOENT. So, something went pretty durn wrong.
,
Mar 26 2018
Is this a single time flake, or repeated? I only noticed because I was watching for changes in behavior to PreCQ behavior. If it's only happened once, there are a number of possible obscure causes (such as Puppet updating related packages as they were being used).
,
Mar 27 2018
> Is this a single time flake, or repeated? I've no idea, and no good idea how to find out. Best we can say is a) it isn't common enough to trigger our sanity check builders, and b) nobody else has complained about it.
,
Mar 27 2018
,
Mar 29 2018
,
Mar 29 2018
Passing to vapier as one familiar with our use of cgroups.
,
Mar 30 2018
,
Mar 30 2018
,
Mar 30 2018
Another one: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8950680208934166640 The first think to check is that the server kernel supports cgroup, but I get a 404 getting buildslave info: https://uberchromegw.corp.google.com/i//buildslaves/swarm-cros-33
,
Mar 30 2018
Ah, that builder link is broken for swarming builds. I filed feedback to get it addressed. This URL can be used for now: https://chrome-swarming.appspot.com/bot?id=swarm-cros-33 Hum... it looks like that builder IS in a bad state. I could reinstance it and fix the problem, but I'd rather understand what happened. gwendal@ can you see if you have enough permissions to examine the builder here? If you do, you can both examine the serial console output and ssh in. https://pantheon.corp.google.com/compute/instancesDetail/zones/us-central1-b/instances/swarm-cros-33?project=chromeos-bot&graph=GCE_CPU&duration=P2D
,
Mar 30 2018
Swarming currently considers that builder Dead. I wonder if this is somehow related to https://crbug.com/827305 .
,
Apr 6 2018
For what it's worth, that builder had a long history of failing everything. I just reinstanced it in case it was corrupt.
,
Jul 20
Have we seen any more flakes?
,
Jul 20
Seems to be gone. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by dgarr...@chromium.org
, Mar 23 2018