New issue
Advanced search Search tips

Issue 682405 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Mar 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 684106
issue 684122
issue 684132
issue 686904
issue 692788



Sign in to add a comment

gs_offloader is failing to offload on multiple shards/drones

Reported by jrbarnette@chromium.org, Jan 18 2017

Issue description

gs_offloader has been (plaintively) complaining about offload
failures since (roughly) mid-December.  The noise has been getting
progressively louder; it's now the single biggest source of complaint
noise on the chromeos-infra-alerts alias.

A spot-check says that the offload complaints are about specific
directories that never offload.  We should go figure out why they
fail and how to stop such failures in future.  We should also then
either delete the problem directories, or get them safely offloaded.

Here the most recent sample complaints from
chromeos-server11.hot.corp.google.com:

First failure       Count   Directory name
=================== ======  ==============================
2017-01-13 15:21:26    117  92202181-chromeos-test
2017-01-13 15:21:28    116  hosts/chromeos4-row9-jetstream-host5/59253162-repair
2017-01-13 15:21:30    116
hosts/chromeos4-row10-jetstream-host7/59385679-repair
2017-01-13 15:21:32    116
hosts/chromeos4-row9-jetstream-host5/59252950-provision
2017-01-13 15:21:33    116
hosts/chromeos4-row9-jetstream-host3/59480989-provision

For comparison, complaints from cros-autotest-shard1.cbf.corp.google.com:
First failure       Count   Directory name
=================== ======  ==============================
2017-01-13 15:21:53    117
hosts/chromeos4-row10-jetstream-host5/59385677-repair
2017-01-13 15:22:00    117
hosts/chromeos4-row9-jetstream-host4/59308759-provision

The times I've spot-checked failures, complaints from
jetstream hosts and tests have been common.

 

Comment 1 by dshi@chromium.org, Jan 23 2017

Owner: pprabhu@chromium.org
Blockedon: 684106
Owner: ----
Status: Available (was: Assigned)
I'm dealing with a specific case in  issue 684106 
Might take a look at some others too...
Blockedon: 684122
Blockedon: 684132
Owner: pprabhu@chromium.org
Status: Assigned (was: Available)
Found at least three distinct failures. I say we should fix these three and then cleanup all our servers.

Then, rinse and repeat.
Blockedon: 686904
Blockedon: 692788
Possible 4th failure mode: issue 692788

Please pick up this gs_offloader change:
https://chromium-review.googlesource.com/c/442630/

Status: Fixed (was: Assigned)
gs_offloader failed job counts are now available as a metric: http://shortn/_na5RjDYBdo and they're looking ...OK.... Marking Fixed.

Thanks lgoodby for the contributions :)

Re#8: That will make it in the next push-to-prod.

Comment 10 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 12 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment