New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 740411 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----

Blocking:
issue 740420



Sign in to add a comment

gale & whirlwind DUTs failed due to "Not enough free inodes on /mnt/stateful_partition"

Project Member Reported by oka@chromium.org, Jul 10 2017

Issue description

gale-paladin:3139 failed

Builders failed on: 
- gale-paladin: 
  https://luci-milo.appspot.com/buildbot/chromeos/gale-paladin/3139


History of gale-paladin: 8 failed build(s) in a row; Last 10 builds: 8 failed, 2 pass

It's network failure?
https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fgale-paladin%2F3139%2F%2B%2Frecipes%2Fsteps%2FHWTest__jetstream_cq_%2F0%2Fstdout
 

Comment 1 by oka@chromium.org, Jul 10 2017

swarming.py run
failed with a lot of 400 OUT_OF_RANGE.

07-09-2017 [15:31:47] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=127770633
--create_and_return was specified, terminating now.
Will return from run_suite with status: OK
15:31:50: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp9Mab21/tmpeXF_VC/temp_summary.json --raw-cmd --task-name gale-paladin/R61-9729.0.0-rc1-jetstream_cq --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:jetstream_cq' '--tags=build:gale-paladin/R61-9729.0.0-rc1' '--tags=task_name:gale-paladin/R61-9729.0.0-rc1-jetstream_cq' '--tags=board:gale' -- /usr/local/autotest/site_utils/run_suite.py --build gale-paladin/R61-9729.0.0-rc1 --board gale --suite_name jetstream_cq --pool cq --num 6 --file_bugs False --priority Build --timeout_mins 180 --retry True --max_retries 5 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 49415577L, 'cidb_build_id': 1652956, 'datastore_parent_key': ('Build', 1652956, 'BuildStage', 49415577L)}" -m 127770633
15:36:19: WARNING: HttpsMonitor.send received status 400: {
  "error": {
    "code": 400,
    "message": "Operation was attempted past the valid range.",
    "status": "OUT_OF_RANGE"
  }
}
...

Comment 2 by oka@chromium.org, Jul 10 2017

Owner: xixuan@chromium.org
xixuan@, could you take a look?

Comment 3 by oka@chromium.org, Jul 10 2017

Summary: gale-paladin:3139 failed (continuous CQ failure) (was: gale-paladin:3139 failed)

Comment 4 by oka@chromium.org, Jul 10 2017

Is it related to the recent swarming proxy outage  crbug.com/738139 ?

Comment 5 by oka@chromium.org, Jul 10 2017

Other possibility is some bad change is between 9725.0.0 and 9726.0.0
https://crosland.corp.google.com/log/9725.0.0..9726.0.0

Comment 6 by oka@chromium.org, Jul 10 2017

Cc: vapier@chromium.org
The range contains https://chromium-review.googlesource.com/c/529365
(enable network sandbox for builds).
Could it be related?


Comment 7 by oka@chromium.org, Jul 10 2017

Blocking: 740420

Comment 8 by vapier@chromium.org, Jul 10 2017

it is probably not related to the network sandbox change.  that should only impact build and unittest phases.  this error is in the hwtest phase and we don't run ebuild commands there.

Comment 9 by xixuan@chromium.org, Jul 10 2017

Cc: akes...@chromium.org
Components: Infra>Client>ChromeOS
My thought for the failure reason is no cq DUTs for gale is healthy:

https://chromeos-proxy.appspot.com/task?id=3744eb065fafff10&refresh=10&show_raw=1

nothing related to swarming proxy.
Cc: jrbarnette@chromium.org skau@chromium.org
Checked 3 gale DUTs of all 6 failed DUTs, they failed in the same patterns: 

After a new test's provision:

Reset failed: due to "Not enough free inodes on /mnt/stateful_partition", example: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row9-jetstream-host3/60954194-reset

Repair failed:  repair.rpm, repair.jetstream_repair, repair.au & repair.powerwash failed, example: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row9-jetstream-host3/60954202-repair

Problems come to 1) why no enough free inodes, 2) the reason we can't repair it.
Summary: gale DUTs failed due to "Not enough free inodes on /mnt/stateful_partition" (was: gale-paladin:3139 failed (continuous CQ failure))
Filed https://b.corp.google.com/issues/63524032 first. Let sheriff @skau debug why this thing happens to all gale DUTs. 
Summary: gale & whirlwind DUTs failed due to "Not enough free inodes on /mnt/stateful_partition" (was: gale DUTs failed due to "Not enough free inodes on /mnt/stateful_partition")

Comment 13 by skau@chromium.org, Jul 10 2017

I currently suspect that a bad CL got into the CQ and killed all the DUTs.  We'll see what happens after the lab recovers them.
Gale repairs were failing due to servos being inaccessible. That has been fixed now, b/63506983. I don't know why the inode failure is showing up now.

Comment 15 by skau@chromium.org, Jul 10 2017

The inode failure might be a bad CL in CQ.  I'll keep an eye on it.
Project Member

Comment 16 by sheriffbot@chromium.org, Jul 24 2017

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 17 by sheriffbot@chromium.org, Aug 8 2017

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Pri-0 Pri-1
Status: WontFix (was: Available)
It has passed about 1 month, so assume it's already fixed or never happen. Mark it as wontfix.

Sign in to add a comment