New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 825387 link

Starred by 4 users

Issue metadata

Status: Archived
Owner:
Closed: Jul 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

chromite unittest flake on TOT.

Project Member Reported by dgarr...@chromium.org, Mar 23 2018

Issue description

A PreCQ build 

This build:

http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8951227604493542672

Failed with this error:

14:51:55: INFO: RunCommand: cros_sdk -- /mnt/host/source/chromite/cbuildbot/run_tests
run_tests: Unhandled exception:
Traceback (most recent call last):
  File "/mnt/host/source/chromite/cbuildbot/run_tests", line 169, in <module>
    DoMain()
  File "/mnt/host/source/chromite/cbuildbot/run_tests", line 165, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/mnt/host/source/chromite/lib/commandline.py", line 911, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/mnt/host/source/chromite/cbuildbot/run_tests.py", line 528, in main
    _ReExecuteIfNeeded([sys.argv[0]] + argv, opts.network)
  File "/mnt/host/source/chromite/cbuildbot/run_tests.py", line 474, in _ReExecuteIfNeeded
    cgroups.Cgroup.InitSystem()
  File "/mnt/host/source/chromite/lib/cros_build_lib.py", line 1926, in wrapper
    val = functor(obj)
  File "/mnt/host/source/chromite/lib/cgroups.py", line 147, in InitSystem
    _EnsureMounted(cls.CGROUP_ROOT, cgroup_root_args)
  File "/mnt/host/source/chromite/lib/cgroups.py", line 136, in _EnsureMounted
    osutils.SafeMakedirs(mnt, sudo=True)
  File "/mnt/host/source/chromite/lib/osutils.py", line 239, in SafeMakedirs
    os.makedirs(path, mode)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 2] No such file or directory: '/sys/fs/cgroup/cros'
14:51:58: ERROR: Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/test_stages.py", line 469, in PerformStage
    cros_build_lib.RunCommand(cmd, enter_chroot=True)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/cros_build_lib.py", line 658, in RunCommand
    raise RunCommandError(msg, cmd_result)
RunCommandError: return code: 1; command: cros_sdk -- /mnt/host/source/chromite/cbuildbot/run_tests
cmd=['cros_sdk', '--', '/mnt/host/source/chromite/cbuildbot/run_tests']

 
Cc: jrbarnette@chromium.org
Going to the cited build page, I see that there's a link
labeled "Bot hostname".  The link is broken:
    https://uberchromegw.corp.google.com/i//buildslaves/swarm-cros-33

There is a feedback button on the top right of that page, please point this out.

Here is the history of the builder in question, which is what I assume you wanted.

https://chrome-swarming.appspot.com/bot?id=swarm-cros-32
> Here is the history of the builder in question, which is what I assume you wanted.

Possibly.  My thinking was more "No such file or directory: '/sys/fs/cgroup/cros'"
isn't an error that would occur only occasionally.  I wanted to check
whether the file could be confirmed as actually present on that bot, and
also confirm that the file would still be present from inside the chroot.

Cc: vapier@chromium.org
I think the cgroups are setup dynamically during chroot creation, so I'm not sure what that means.

Possibly the chroot was somewhat corrupt?
> I think the cgroups are setup dynamically during chroot creation, so I'm not sure what that means.

My reading of the test failure is that it was actually in the
process of creating the target location.  That would mean we were
missing /sys/fs/cgroup.  A simple check on my workstation suggests
that that directory gets created/mounted by cros_sdk (leastways,
that's the normal case):
    $ cros_sdk -- ls -la /sys/fs/cgroup
    total 0
    dr-xr-xr-x 2 root root 0 Mar  6 16:07 .
    drwxr-xr-x 9 root root 0 Mar 12 04:01 ..

Also, there's this:
  File "/mnt/host/source/chromite/lib/osutils.py", line 239, in SafeMakedirs
    os.makedirs(path, mode)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)

`os.makedirs()` is supposed to be proof against petty problems like
ENOENT.  So, something went pretty durn wrong.

Is this a single time flake, or repeated? I only noticed because I was watching for changes in behavior to PreCQ behavior.

If it's only happened once, there are a number of possible obscure causes (such as Puppet updating related packages as they were being used).

> Is this a single time flake, or repeated?

I've no idea, and no good idea how to find out.  Best we can say is
a) it isn't common enough to trigger our sanity check builders, and
b) nobody else has complained about it.

Status: WontFix (was: Untriaged)
Cc: bmgordon@chromium.org ayatane@chromium.org
Status: Untriaged (was: WontFix)
Cc: jclinton@chromium.org
Owner: vapier@chromium.org
Passing to vapier as one familiar with our use of cgroups.
Components: Infra>Client>ChromeOS>CI
Components: -Infra>Client>ChromeOS
Another one:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8950680208934166640

The first think to check is that the server kernel supports cgroup, but I get a 404 getting buildslave info: https://uberchromegw.corp.google.com/i//buildslaves/swarm-cros-33
Cc: jinjingl@chromium.org
Ah, that builder link is broken for swarming builds. I filed feedback to get it addressed.

This URL can be used for now:
  https://chrome-swarming.appspot.com/bot?id=swarm-cros-33

Hum... it looks like that builder IS in a bad state. I could reinstance it and fix the problem, but I'd rather understand what happened.

gwendal@ can you see if you have enough permissions to examine the builder here? If you do, you can both examine the serial console output and ssh in.

https://pantheon.corp.google.com/compute/instancesDetail/zones/us-central1-b/instances/swarm-cros-33?project=chromeos-bot&graph=GCE_CPU&duration=P2D

Swarming currently considers that builder Dead. I wonder if this is somehow related to  https://crbug.com/827305 .
For what it's worth, that builder had a long history of failing everything. I just reinstanced it in case it was corrupt.
Have we seen any more flakes?

Status: Archived (was: Untriaged)
Seems to be gone.

Sign in to add a comment