New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 800382 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

gs_offloader should never terminate because of a single job offload failure

Reported by jrbarnette@chromium.org, Jan 9 2018

Issue description

This is related to  bug 800059 .  When the bad directory
that couldn't be offloaded was encountered, the behavior
of gs_offloader was to terminate, and write an exception
like this _to stderr_:
Traceback (most recent call last):
  File "/usr/local/autotest/site-packages/chromite/lib/parallel.py", line 603, in TaskRunner
    task(*x, **task_kwargs)
  File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 581, in offload
    stderr_file)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 627, in _offload
    sanitize_dir(dir_entry)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 208, in sanitize_dir
    _escape_rename_dir_contents(dirpath)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents
    _escape_rename_dir_contents(path)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents
    _escape_rename_dir_contents(path)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents
    _escape_rename_dir_contents(path)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents
    _escape_rename_dir_contents(path)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents
    _escape_rename_dir_contents(path)
  File "/usr/local/autotest/site_utils/gs_offloader.py", line 218, in _escape_rename_dir_contents
    for filename in os.listdir(dirpath):
OSError: [Errno 13] Permission denied: 'hosts/chromeos6-row3-rack11-host11/1699958-provision/20180901052429/crashinfo.chromeos6-row3-rack11-host11/var/log/dp

This isn't an acceptable failure mode:  No single job offload failure
should terminate gs_offloader.  The standard response to that sort
of event is supposed to be "log it and move on".

We need to adjust the offload loop so that there's a try block
at the highest level of the call chain for each individual directory.
That try block should implement the "log it and move on policy".

 
Owner: jrbarnette@chromium.org
Status: Assigned (was: Untriaged)
Labels: -Chase-Pending
already started
Project Member

Comment 3 by bugdroid1@chromium.org, Jan 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/dd8726b07f7b35da3a8592c4b9eef47f73301e09

commit dd8726b07f7b35da3a8592c4b9eef47f73301e09
Author: Richard Barnette <jrbarnette@chromium.org>
Date: Fri Jan 19 20:10:44 2018

[autotest] Protect gs_offloader from offload exceptions.

Unhandled exceptions raised when offloading a single directory
could cause gs_offloader to terminate, and thus be unable to offload
other directories.  This adds a try-block to catch all exceptions
to prevent individual directories from causing global problems.

BUG= chromium:800382 
TEST=TBD

Change-Id: If44edf0567a547e3088c18c0c8709d61d8e87ac5
Reviewed-on: https://chromium-review.googlesource.com/858476
Commit-Ready: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/dd8726b07f7b35da3a8592c4b9eef47f73301e09/site_utils/gs_offloader.py

Status: Fixed (was: Assigned)
Waiting on a push to prod; otherwise, this is fixed.

Sign in to add a comment