New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 638638 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: ----



Sign in to add a comment

Add Shard Disk utilization metric to monarch

Project Member Reported by xixuan@chromium.org, Aug 17 2016

Issue description

Shard's disk can be full by containing too many result logs (eg. big crash infos). Once it's full, it will affects some services, like shard RPC won't work.

Repartition the shard to make the logs and other services separated may be a good solution.

 

Comment 1 by aut...@google.com, Aug 23 2016

Owner: shuqianz@chromium.org
Charlene, can you take a look at this for feasibility? 

Another option could be to just cap the size / quantity of the logs that we store. 

https://bugs.chromium.org/p/chromium/issues/detail?id=638641 MAY also help some.... 
> Another option could be to just cap the size / quantity of
> the logs that we store. 

Rotating logs would be good, but log size isn't quite the problem.
The problem is that logs and test results are in the same file
system.  So, when the file system fills up because of a sudden
excess of test results, we lose the ability to write logs.

So, capping logs won't help; we need to cap test results.

Also, I'm not aware of any mechanism we have available that would
make it easy to cap either log sizes or results directory sizes.
Perhaps Linux supports putting a size limit on a directory hierarchy?


> https://bugs.chromium.org/p/chromium/issues/detail?id=638641 MAY
> also help some.... 

Well, that fix is mostly orthogonal.  It could make the problem of
"the disk filled up" less common, but it can't mitigate the failure
once it's occurred.

Cc: -ayatane@chromium.org
Owner: ayatane@chromium.org
Summary: Add Shard Disk utilization metric to monarch (was: Repartition the shard to separate the results from the logs and other services.)
I think the best solution is to get aware of the disk issue before it crashes the servers. Thanks to ayatane@, the shard Disk utilization metric will be added to Monarch. The plan is to send alert if the Disk utilization>=75%. After this metric is added, we can get notification of the potential Disk issue and prevent it from crashing the server. 

ayatane@, not sure whether you've already had a bug for the metric thing. If you have, you can merge this one into the one you already have.
Mergedinto: 621741
Status: Duplicate (was: Untriaged)

Sign in to add a comment