gs_offloader should never terminate because of a single job offload failure
Reported by
jrbarnette@chromium.org,
Jan 9 2018
|
|||
Issue descriptionThis is related to bug 800059 . When the bad directory that couldn't be offloaded was encountered, the behavior of gs_offloader was to terminate, and write an exception like this _to stderr_: Traceback (most recent call last): File "/usr/local/autotest/site-packages/chromite/lib/parallel.py", line 603, in TaskRunner task(*x, **task_kwargs) File "/usr/local/autotest/site-packages/chromite/lib/metrics.py", line 483, in wrapper return fn(*args, **kwargs) File "/usr/local/autotest/site_utils/gs_offloader.py", line 581, in offload stderr_file) File "/usr/local/autotest/site_utils/gs_offloader.py", line 627, in _offload sanitize_dir(dir_entry) File "/usr/local/autotest/site_utils/gs_offloader.py", line 208, in sanitize_dir _escape_rename_dir_contents(dirpath) File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents _escape_rename_dir_contents(path) File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents _escape_rename_dir_contents(path) File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents _escape_rename_dir_contents(path) File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents _escape_rename_dir_contents(path) File "/usr/local/autotest/site_utils/gs_offloader.py", line 224, in _escape_rename_dir_contents _escape_rename_dir_contents(path) File "/usr/local/autotest/site_utils/gs_offloader.py", line 218, in _escape_rename_dir_contents for filename in os.listdir(dirpath): OSError: [Errno 13] Permission denied: 'hosts/chromeos6-row3-rack11-host11/1699958-provision/20180901052429/crashinfo.chromeos6-row3-rack11-host11/var/log/dp This isn't an acceptable failure mode: No single job offload failure should terminate gs_offloader. The standard response to that sort of event is supposed to be "log it and move on". We need to adjust the offload loop so that there's a try block at the highest level of the call chain for each individual directory. That try block should implement the "log it and move on policy".
,
Jan 16 2018
already started
,
Jan 19 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/dd8726b07f7b35da3a8592c4b9eef47f73301e09 commit dd8726b07f7b35da3a8592c4b9eef47f73301e09 Author: Richard Barnette <jrbarnette@chromium.org> Date: Fri Jan 19 20:10:44 2018 [autotest] Protect gs_offloader from offload exceptions. Unhandled exceptions raised when offloading a single directory could cause gs_offloader to terminate, and thus be unable to offload other directories. This adds a try-block to catch all exceptions to prevent individual directories from causing global problems. BUG= chromium:800382 TEST=TBD Change-Id: If44edf0567a547e3088c18c0c8709d61d8e87ac5 Reviewed-on: https://chromium-review.googlesource.com/858476 Commit-Ready: Richard Barnette <jrbarnette@chromium.org> Tested-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/dd8726b07f7b35da3a8592c4b9eef47f73301e09/site_utils/gs_offloader.py
,
Jan 22 2018
Waiting on a push to prod; otherwise, this is fixed. |
|||
►
Sign in to add a comment |
|||
Comment 1 by jrbarnette@chromium.org
, Jan 9 2018Status: Assigned (was: Untriaged)