New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 706132 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 726530
issue 721552
issue 726540



Sign in to add a comment

/var/log/ or /tmp/ on DUT can get full

Project Member Reported by gwendal@chromium.org, Mar 28 2017

Issue description

See https://buganizer.corp.google.com/issues/35648410 error report

I believe the error is on the DUT side, wondering if /tmp gets full.


 
Looking deeper in this qual (gs://chromeos-moblab-quanta/results/a0:3b:2c:63:96:78/c6d5d706-07e0-11e7-b402-a03b2c639678)

We fail very early:
03/26 05:52:54.878 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::START  ----  ----  timestamp=1490478773  localtime=Mar 25 14:52:53 
03/26 05:52:54.879 INFO |        server_job:0153|   START ----  ----  timestamp=1490478773  localtime=Mar 26 05:52:53 
03/26 05:52:54.886 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::ABORT  ----  ----  timestamp=1490478773  localtime=Mar 25 14:52:53 client.bin.job.__init__ failed: [Errno 28] No space left on device
03/26 05:52:54.887 INFO |        server_job:0153|     ABORT ----  ----  timestamp=1490478773  localtime=Mar 26 05:52:53 client.bin.job.__init__ failed: [Errno 28] No space left on device


In ./166-moblab/192.168.231.101/sysinfo/df

tmp                               1915  1134       782      60% /tmp
/dev/mapper/encstateful           7390  1900      5490      26% /var

We are tarring the log from var in /tmp. If the logs get large we will hit a out of space issue.

Cc: jrbarnette@chromium.org sbasi@chromium.org
Labels: -Pri-2 Pri-1
Status: Started (was: Untriaged)
Can reproduce:
- create a 6G file in /var/log/message.7:  fallocate -l 6G messages.7
- call test_that 100.107.3.187 StorageQualBase.test, where most of the tests are removed.

We fail at the beginning, collecting log files.


client.bin.job.__init__ failed: [Errno 28] No space left on device

We are overutilizing /tmp.
First we install the client in /tmp/sysinfo/autoserv-yyNd1Q
Then copy /var/log and other files there and then copy over to the server.


Enclosed full test_that, without net*.log and messages.7 in  results-1-StorageQualBase.test/sysinfo/var/log/ (truncated) and results-1-StorageQualBase.test/crashinfo.100.107.3.187/var/log/ (full).


The problem is job.sysinfo.log_before_each_test, which calls:
 log.run(log_dir=None, collect_init_status=True)
copying the contnent of "/var/log" [aka LOG_DIR] into 


It even prevent logging from working: (
05/25 14:05:45.321 ERROR|     setup_modules:0085| post-reboot sysinfo error:Exception occurred formatting message: 'post-reboot sysinfo error:' using args ()
05/25 14:05:45.322 ERROR|         traceback:0013| Traceback (most recent call last):
05/25 14:05:45.323 ERROR|         traceback:0013|   File "/usr/local/lib64/python2.7/logging/__init__.py", line 883, in emit
05/25 14:05:45.325 ERROR|         traceback:0013|     self.flush()
05/25 14:05:45.327 ERROR|         traceback:0013|   File "/usr/local/lib64/python2.7/logging/__init__.py", line 843, in flush
05/25 14:05:45.328 ERROR|         traceback:0013|     self.stream.flush()
05/25 14:05:45.329 ERROR|         traceback:0013| IOError: [Errno 28] No space left on device


test_that_results_OvG4_J.tgz
6.1 MB Download
Blockedon: 721552
Solutions: 

- Force a trimming /var/log/, removing old files before collecting, to fit in /tmp
- Be more selective of what to store in sysinfo: no need to store /var/log/messages.X, net.X, ./power_manager/powerd.2017..., /chrome/chrome_2017... they are already archives.

Currently we store all these files to be able to show their diff later.
store only a subset in /tmp.
Blockedon: 726530
Components: -Tests Infra>Client>ChromeOS
Blockedon: 726540
Status: Available (was: Started)
Labels: -Pri-1 Pri-2
Owner: dshi@chromium.org
Summary: /var/log/ or /tmp/ on DUT can get full (was: autotest: root cause failure: client.bin.job.__init__ failed: [Errno 28] No space left on device)
May be related to dshi's ongoing work. Reducing to P2. dshi any comment here?

Comment 9 by dshi@chromium.org, Jun 12 2017

We should add a check of free space before collecting each logdir.

The full collection of /var/log was introduced here:
https://chromium-review.googlesource.com/#/c/335958/

We had some discussion there. I'd like to revert that.
Project Member

Comment 10 by bugdroid1@chromium.org, Oct 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b152039ff29d7f459f166b3ec9e14320d4275d94

commit b152039ff29d7f459f166b3ec9e14320d4275d94
Author: Dan Shi <dshi@google.com>
Date: Tue Oct 03 03:25:48 2017

[autotest] Do not collect full logs per boot

Autotest per-iteration sysinfo collection collects too much logs and causes
unnecessary test failures.

This change only affects pre-test log collection. Autotest will still:
1. Collect new logs in /var/log generated during the test run, saved under
   [test_name]/sysinfo/var/log_diff.
2. If a test failed due to dut failure, e.g., lost network, unexpected reboot,
   complete /var/log will be collected.

BUG= chromium:706132 
TEST=unittest, local run test

Change-Id: I00f00e494cd41989454097d0b853ccd348dff5e2
Reviewed-on: https://chromium-review.googlesource.com/693042
Commit-Ready: Dan Shi <dshi@google.com>
Tested-by: Dan Shi <dshi@google.com>
Reviewed-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Steven Bennetts <stevenjb@chromium.org>
Reviewed-by: Ahmed Fakhry <afakhry@chromium.org>

[modify] https://crrev.com/b152039ff29d7f459f166b3ec9e14320d4275d94/client/bin/base_sysinfo.py

Comment 11 by dshi@chromium.org, Oct 5 2017

Status: Fixed (was: Available)

Sign in to add a comment