New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 678404 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

SlaveFreeDiskSpaceVeryLow on Swarming bots is difficult to respond to

Project Member Reported by katthomas@google.com, Jan 4 2017

Issue description

Currently the trooper playbook entry doesn't really apply to swarming bots.

https://buganizer.corp.google.com/issues/34049848 indicates it may be an issue with the alert threshold. Assigning to @vadimsh for now since he seems most knowledgeable.

In the meantime, I'm not sure what to do about https://b.corp.google.com/issues/34060298...
 
Description: Show this description
Cc: benhenry@chromium.org vadimsh@chromium.org dsansome@chromium.org mar...@chromium.org
 Issue 678440  has been merged into this issue.
Project Member

Comment 3 by bugdroid1@chromium.org, Jan 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/e7fcbce2f8fcb01c3785d42278a913a33eb16356

commit e7fcbce2f8fcb01c3785d42278a913a33eb16356
Author: vadimsh <vadimsh@chromium.org>
Date: Thu Jan 05 01:51:17 2017

Make Swarming bot keep 5%+250MB of the disk free.

Also add more comments for disk self-quarantine thresholds definition, since
they are somewhat confusing.

This introduces a new knob, instead of tweaking self-quarantine knob, because:
  1) We don't want to brick all bots that happen to have <5% && >4GB of free
     disk space.
  2) We want to have some breathing room between desired free disk space and
     the disk space that triggers the quarantine.

BUG= 678404 
R=dsansome@chromium.org

Review-Url: https://codereview.chromium.org/2614623004

[modify] https://crrev.com/e7fcbce2f8fcb01c3785d42278a913a33eb16356/appengine/swarming/swarming_bot/api/os_utilities.py
[modify] https://crrev.com/e7fcbce2f8fcb01c3785d42278a913a33eb16356/appengine/swarming/swarming_bot/bot_code/bot_main.py
[modify] https://crrev.com/e7fcbce2f8fcb01c3785d42278a913a33eb16356/appengine/swarming/swarming_bot/bot_code/bot_main_test.py

Status: Assigned (was: Untriaged)
Components: Infra>Platform>Swarming
Status: Fixed (was: Assigned)
This has been deployed. Swarming bots now try to keep 5%+250MB of the disk free.

If this proves too close to the alerting threshold, the following constant can be changed: https://github.com/luci/luci-py/blob/master/appengine/swarming/swarming_bot/api/os_utilities.py#L81

Sign in to add a comment