New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 841013 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 842912



Sign in to add a comment

amd64-generic-tot-asan-informational VMTest failing with "No space left on device"

Project Member Reported by sammiequon@chromium.org, May 8 2018

Issue description

Alot of the tests are failing:
ABORT: client.bin.job.__init__ failed: [Errno 28] No space left on device


 
Description: Show this description
This has been failing since R68-10657.0.0-b2551015 (early tuesday)

07:32:54 INFO | autoserv| scp: /usr/local/autotest/packages.checksum: No space left on device
07:32:54 INFO | autoserv| Running (ssh) 'echo B > /usr/local/autotest/tmp/_autotmp_WbkC8sharness-fifo/autoserv.fifo' from '_wait_for_commands|process_output|write|_process_line|run|run_very_slowly'
07:32:54 INFO | autoserv| AUTOTEST_STATUS::		ERROR	desktopui_KillRestart.chrome	desktopui_KillRestart.chrome	timestamp=1525789974	localtime=May 08 07:32:54	Unhandled EOFError:
07:32:54 INFO | autoserv| ERROR	desktopui_KillRestart.chrome	desktopui_KillRestart.chrome	timestamp=1525789974	localtime=May 08 07:32:54	Unhandled EOFError:
07:32:54 INFO | autoserv| AUTOTEST_STATUS::  Traceback (most recent call last):
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/job.py", line 495, in _runtest
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      parallel.fork_waitfor(self.resultdir, pid)
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/parallel.py", line 84, in fork_waitfor
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      _check_for_subprocess_exception(tmp, pid)
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/parallel.py", line 60, in _check_for_subprocess_exception
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      e = pickle.load(file(ename, 'r'))
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/lib64/python2.7/pickle.py", line 1378, in load
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      return Unpickler(file).load()
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/lib64/python2.7/pickle.py", line 858, in load
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      dispatch[key](self)
07:32:54 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/lib64/python2.7/pickle.py", line 880, in load_eof
07:32:54 INFO | autoserv| AUTOTEST_STATUS::      raise EOFError
07:32:54 INFO | autoserv| AUTOTEST_STATUS::  EOFError
07:32:55 INFO | autoserv| AUTOTEST_STATUS::	END ERROR	desktopui_KillRestart.chrome	desktopui_KillRestart.chrome	timestamp=1525789974	localtime=May 08 07:32:54
07:32:55 INFO | autoserv| END ERROR	desktopui_KillRestart.chrome	desktopui_KillRestart.chrome	timestamp=1525789974	localtime=May 08 07:32:54
07:32:55 INFO | autoserv| AUTOTEST_STATUS::END ABORT	----	----	timestamp=1525789975	localtime=May 08 07:32:55	Unhandled OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| END ABORT	----	----	timestamp=1525789975	localtime=May 08 07:32:55	Unhandled OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| AUTOTEST_STATUS::  Traceback (most recent call last):
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/job.py", line 1033, in step_engine
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      execfile(self.control, global_control_vars, global_control_vars)
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/control.autoserv", line 21, in <module>
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      job.run_test('desktopui_KillRestart', binary='^chrome$', tag='chrome')
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/job.py", line 1237, in run_test
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      passed = base_client_job.run_test(self, url, *args, **dargs)
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/job.py", line 69, in wrapped
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      self.harness.run_test_complete()
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/harness_autoserv.py", line 73, in run_test_complete
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      self._send_and_wait('AUTOTEST_TEST_COMPLETE')
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/bin/harness_autoserv.py", line 53, in _send_and_wait
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      dir=self.job.tmpdir)
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/autotemp.py", line 102, in __init__
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      prefix=prefix, dir=dir)
07:32:55 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/lib64/python2.7/tempfile.py", line 333, in mkdtemp
07:32:55 INFO | autoserv| AUTOTEST_STATUS::      _os.mkdir(file, 0700)
07:32:55 INFO | autoserv| AUTOTEST_STATUS::  OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| Got lock of exit_code_file.


https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2553962
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2554498
Cc: steve...@chromium.org ihf@chromium.org
+ihf +stevenjb

ihf, stevenjb, and ideas on how to go about debugging this / running this asan bot on local machine?
Cc: achuith@chromium.org
+achuith
Cc: adurbin@chromium.org
Labels: -Pri-3 Pri-1
Owner: philipchen@chromium.org
This doesn't seem to be a problem caused by a chrome CL.

The public chromeos waterfall:
https://uberchromegw.corp.google.com/i/chromiumos/waterfall

The ASAN bot in this waterfall:
https://uberchromegw.corp.google.com/i/chromiumos/builders/amd64-generic-asan

It seems to be failing the same tests, though I haven't looked further at root cause.
Following tests failed on build 24953:
login_LoginSuccess
security_SandboxLinuxUnittests
security_SysLogPermissions
security_ChromiumOSLSM
security_DbusOwners
security_SuidBinaries
security_mprotect
logging_UserCrash
security_SandboxedServices
security_OpenSSLBlacklist
security_Minijail_seccomp
security_ASLR
platform_OSLimits
login_OwnershipRetaken
desktopui_KillRestart
login_MultiUserPolicy
login_UserPolicyKeys
login_RemoteOwnership
security_RootCA
security_RuntimeExecStack
security_Minijail0
logging_CrashSender
security_RootfsStatefulSymlinks
platform_CUPSDaemon
kernel_CryptoAPI
login_OwnershipNotRetaken
login_SameSessionTwice
login_GuestAndActualSession
login_RetrieveActiveSessions

This seems to be a chromeos issue that chromeos sheriffs should probably be taking a look at?
What do the asan runs do differently w.r.t. disk space? Does it create large reports?

Comment 7 by ihf@chromium.org, May 9 2018

ASAN creates larger binaries. Also the image is created with certain parameters and VM started to not use too many resources on the server that runs the VM. Presumably some need to be tweaked to make space.
Who is a good owner for this? Presumably someone who's pushing asan support?

Comment 9 by ihf@chromium.org, May 9 2018

Presumably anybody who is interested in finding memory allocation issues on ChromeOS.
Cc: nxia@chromium.org
Hmm...so it doesn't look like a specific CL breaks this.
Can someone from infra team help fix memory allocation?
re#9: someone enabled asan, right? It didn't just magically happen? I don't have enough knowledge of the requirements for ASAN nor what vmtests provide w.r.t. those requirements. It sounds like you do, but aren't sharing?
ihf@ - would you happen to know someone or someone who might know someone who would be knowledgeable about this bot?
Cc: -sammiequon@chromium.org derat@chromium.org
Cc: vapier@chromium.org
+vapier

vapier@ can you help triage?

Comment 15 by derat@chromium.org, May 14 2018

I don't know anything about this, but I've made a change to increase amd64-generic's stateful partition to 3 GB and started a tryjob to see if it helps. I won't be surprised if I run into trouble due to this exceeding the disk image's size or something like that, though.

Comment 16 by derat@chromium.org, May 14 2018

Blockedon: 842912
amd64-generic-asan is unable to build chromeos-chrome now, which blocks this ( issue 842912 ).

Comment 17 by derat@chromium.org, May 15 2018

Components: Test
Owner: derat@chromium.org
Status: Started (was: Untriaged)
Summary: amd64-generic-tot-asan-informational VMTest failing with "No space left on device" (was: amd64-generic-tot-asan-informational VMTest failing)
My tryjob at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946512024108427328 passed, so I sent https://crrev.com/c/1058132 for review.
Project Member

Comment 18 by bugdroid1@chromium.org, May 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/53ec8ca21839d122825648da1635a2660acd5c0b

commit 53ec8ca21839d122825648da1635a2660acd5c0b
Author: Daniel Erat <derat@chromium.org>
Date: Tue May 15 06:57:03 2018

overlay-amd64-generic: Bump stateful partition to 3 GB.

The amd64-generic-tot-asan-informational builder's VMTest
stage has started failing after running out of space while
writing to /usr/local/autotest. Increase the stateful
partition's size from 2432 MB to 3072 MB to try to avoid
this.

BUG= chromium:841013 
TEST=none

Change-Id: If31b4edb758296e7d5595b8c67835af04495612d
Reviewed-on: https://chromium-review.googlesource.com/1058132
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Ilja H. Friedel <ihf@chromium.org>

[modify] https://crrev.com/53ec8ca21839d122825648da1635a2660acd5c0b/overlay-amd64-generic/scripts/disk_layout.json

Sign in to add a comment