New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 654953 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

leave less garbage in /tmp on chromeos4-devserver5 (possibly other devservers too)

Project Member Reported by semenzato@chromium.org, Oct 12 2016

Issue description

On chromeos4-devserver5 there are over 58,000 files in /tmp.  Are we keeping those files for a reason?

Most of the /tmp entries are directories and look like these (plain "ls" takes a bit over 1 second, even when cached).

...
cros-update01PfYV
cros-update01st8N
cros-update01zJS1
cros-update0249he
cros-update02Awj5
cros-update02d3Ci
...

They go as far back as September 2, which is the time of the last reboot.
 
Sorry, correction, those are directories, not files.

The total disk usage (computed by sampling, or else it's too slow) is about 100 GB.
Cc: jrbarnette@chromium.org akes...@chromium.org
Adding folks randomly.
Cc: xixuan@chromium.org
Labels: -Pri-2 Pri-1
Status: Available (was: Untriaged)
I'm going to guess that these are products of the new
provision code not cleaning up after itself.

I was assuming that by filing bugs under this category someone would see them and triage them, but maybe I was wrong?  Should I just assign them randomly in the future?

Comment 5 by xixuan@chromium.org, Oct 14 2016

Owner: xixuan@chromium.org
Status: Assigned (was: Available)
These come from provision jobs. Will create a job to delete them regularly.
Is it useful to leave them around for a while, or could they be removed at the end of the task that created them?

Comment 7 by xixuan@chromium.org, Oct 14 2016

It's designed to not to be deleted directly after a finished provision task, since I thought one may check these logs if the logs are not properly transferred to shard/drone. However, seems RPC 'collect_au_log' is very stable and never fail.

So after offline talk with Richard, I will delete it directly after provision task is finished.
Well, you have to wait to delete it until after the collect_au_log RPC is called, right? 
Cc: ayatane@chromium.org
R#8, right
Issue 652200 has been merged into this issue.
Cc: cywang@chromium.org
Project Member

Comment 13 by bugdroid1@chromium.org, Oct 17 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/dev-util/+/1bbfaba3a6b3487fd8edc000e761f155b2c0a665

commit 1bbfaba3a6b3487fd8edc000e761f155b2c0a665
Author: xixuan <xixuan@chromium.org>
Date: Fri Oct 14 00:53:22 2016

Devserver: delete execute_log file for provision.

Previously, execute_log for provision is preserved in devserver for possible
future investigating. However, experience shows that they're barely checked.

This CL deletes the provision execute_log after it's transferred back to
shard/drone.

BUG= chromium:654953 
TEST=Run repair in local autotest with local devserver, to check whether the
file is transferred back and also deleted in /tmp/.

Change-Id: I62c6b1371eba5ca9b11c716ec1fcab111ce93efa
Reviewed-on: https://chromium-review.googlesource.com/398423
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/1bbfaba3a6b3487fd8edc000e761f155b2c0a665/cros_update_progress.py
[modify] https://crrev.com/1bbfaba3a6b3487fd8edc000e761f155b2c0a665/devserver.py

Update:

another CL is prepared to avoid leaving garbage.

Also a script is running now to delete these garbages older than 2 days ago.
Project Member

Comment 15 by bugdroid1@chromium.org, Oct 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/dev-util/+/3bc974ea3c230299b95afe764c37a87e6bab071e

commit 3bc974ea3c230299b95afe764c37a87e6bab071e
Author: xixuan <xixuan@chromium.org>
Date: Wed Oct 19 00:21:43 2016

devserver: remove temp directory for storing devserver codes.

Currently, when devserver tries to transfer devserver package, it first copies
the codes without some unneccesary files to a temp directory, then transfer the
whole package to device. This procedure will leave a temp directory on
devserver and won't be deleted after the provision succeeds or fails.

This CL helps the devserver to pass the temp directory to the auto_updater, and
then delete the directory after provision is finished.

BUG= chromium:654953 
TEST=run repair with local autotest and devserver.

Change-Id: I4d0bd4516923a3bd41c455175ca36093e24266c1
Reviewed-on: https://chromium-review.googlesource.com/399989
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/3bc974ea3c230299b95afe764c37a87e6bab071e/cros_update_progress.py
[modify] https://crrev.com/3bc974ea3c230299b95afe764c37a87e6bab071e/cros_update.py
[modify] https://crrev.com/3bc974ea3c230299b95afe764c37a87e6bab071e/devserver.py

Please also see related  bug 664360 .
fixed?
Status: Fixed (was: Assigned)
For this particular bug, yes.

For long-term, make a cron-job to clean files which are not removed by random reasons, no.

We can close this for now, and track the long-term goal in  bug 664360 .
Status: Assigned (was: Fixed)
I don't think this was fixed as of Dec 7.  chromeos2-devserver7 was still showing the same pattern of steady increase in the number of processes, and of disk space use.  You can check on viceroy/chromeos.  Unfortunately that devserver hardware failed (maybe overheated? :) so we can't check now.

If it was fixed but the fix was not pushed to that devserver, feel free to close again, but it may be good to check the other devservers.

Thanks!

Cc: ihf@chromium.org dgarr...@chromium.org davidri...@chromium.org dshi@chromium.org
 Issue 664360  has been merged into this issue.
In fact I just checked chromeos4-devserver5 and it took 2 minutes and 12 seconds to run "ls /tmp".  There are 36,000 entries.  The /tmp/cros-updateXXX are about 1.5MB each.  Many of them are from October.

The total size of /tmp is not that big, but the number of entries could be a problem.  readdir(2) could be blocking, also directory operations (adding or removing a file) take linear time.

https://chrome-internal-review.googlesource.com/#/c/310135/ has cleared /tmp/ of every devserver every 12 hours, but one of the devserver still has 36,000 entries in /tmp and most of them are from October?!

I can't check any devserver due to network restriction, but it's unexpected.
Labels: Hotlist-Fixit
Labels: cros-infra-fixedit-q117
Status: Fixed (was: Assigned)
It's found that chromeos4-devserver5 has a not up-to-date chromeos-admin, which make puppet fail to update the newest setting to this devserver.

Also this server has some wrong settings in its chromiumos repo, which blocks 'repo sync' in it, and as a result this server hasn't been updated from december.

Now these two issues are manually fixed. We already have a new 'sync_and_run_puppet' cron_job to update chromeos_admin every 4 hours, which will ignore any local changes. So this won't be a problem any more. 

Another suspicious devserver chromeos2-devserver7 is offline. 

Mark this bug as fixed. Feel free to reopen it if you find more devservers have crashes in its /tmp/.

Comment 25 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 26 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 28 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment