amd64-generic-tot-asan-informational VMTest failing with "No space left on device" |
||||||||||
Issue descriptionAlot of the tests are failing: ABORT: client.bin.job.__init__ failed: [Errno 28] No space left on device
,
May 9 2018
This has been failing since R68-10657.0.0-b2551015 (early tuesday)
07:32:54 INFO | autoserv| scp: /usr/local/autotest/packages.checksum: No space left on device
07:32:54 INFO | autoserv| Running (ssh) 'echo B > /usr/local/autotest/tmp/_autotmp_WbkC8sharness-fifo/autoserv.fifo' from '_wait_for_commands|process_output|write|_process_line|run|run_very_slowly'
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: ERROR desktopui_KillRestart.chrome desktopui_KillRestart.chrome timestamp=1525789974 localtime=May 08 07:32:54 Unhandled EOFError:
07:32:54 INFO | autoserv| ERROR desktopui_KillRestart.chrome desktopui_KillRestart.chrome timestamp=1525789974 localtime=May 08 07:32:54 Unhandled EOFError:
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: Traceback (most recent call last):
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/job.py", line 495, in _runtest
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: parallel.fork_waitfor(self.resultdir, pid)
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/parallel.py", line 84, in fork_waitfor
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: _check_for_subprocess_exception(tmp, pid)
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/parallel.py", line 60, in _check_for_subprocess_exception
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: e = pickle.load(file(ename, 'r'))
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/lib64/python2.7/pickle.py", line 1378, in load
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: return Unpickler(file).load()
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/lib64/python2.7/pickle.py", line 858, in load
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: dispatch[key](self)
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/lib64/python2.7/pickle.py", line 880, in load_eof
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: raise EOFError
07:32:54 INFO | autoserv| AUTOTEST_STATUS:: EOFError
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: END ERROR desktopui_KillRestart.chrome desktopui_KillRestart.chrome timestamp=1525789974 localtime=May 08 07:32:54
07:32:55 INFO | autoserv| END ERROR desktopui_KillRestart.chrome desktopui_KillRestart.chrome timestamp=1525789974 localtime=May 08 07:32:54
07:32:55 INFO | autoserv| AUTOTEST_STATUS::END ABORT ---- ---- timestamp=1525789975 localtime=May 08 07:32:55 Unhandled OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| END ABORT ---- ---- timestamp=1525789975 localtime=May 08 07:32:55 Unhandled OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: Traceback (most recent call last):
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/job.py", line 1033, in step_engine
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: execfile(self.control, global_control_vars, global_control_vars)
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/control.autoserv", line 21, in <module>
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: job.run_test('desktopui_KillRestart', binary='^chrome$', tag='chrome')
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/job.py", line 1237, in run_test
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: passed = base_client_job.run_test(self, url, *args, **dargs)
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/job.py", line 69, in wrapped
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: self.harness.run_test_complete()
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/harness_autoserv.py", line 73, in run_test_complete
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: self._send_and_wait('AUTOTEST_TEST_COMPLETE')
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/bin/harness_autoserv.py", line 53, in _send_and_wait
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: dir=self.job.tmpdir)
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/autotemp.py", line 102, in __init__
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: prefix=prefix, dir=dir)
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/lib64/python2.7/tempfile.py", line 333, in mkdtemp
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: _os.mkdir(file, 0700)
07:32:55 INFO | autoserv| AUTOTEST_STATUS:: OSError: [Errno 28] No space left on device: '/usr/local/autotest/tmp/_autotmp_3BJDAgharness-fifo'
07:32:55 INFO | autoserv| Got lock of exit_code_file.
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2553962
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2554498
,
May 9 2018
+ihf +stevenjb ihf, stevenjb, and ideas on how to go about debugging this / running this asan bot on local machine?
,
May 9 2018
+achuith
,
May 9 2018
This doesn't seem to be a problem caused by a chrome CL. The public chromeos waterfall: https://uberchromegw.corp.google.com/i/chromiumos/waterfall The ASAN bot in this waterfall: https://uberchromegw.corp.google.com/i/chromiumos/builders/amd64-generic-asan It seems to be failing the same tests, though I haven't looked further at root cause. Following tests failed on build 24953: login_LoginSuccess security_SandboxLinuxUnittests security_SysLogPermissions security_ChromiumOSLSM security_DbusOwners security_SuidBinaries security_mprotect logging_UserCrash security_SandboxedServices security_OpenSSLBlacklist security_Minijail_seccomp security_ASLR platform_OSLimits login_OwnershipRetaken desktopui_KillRestart login_MultiUserPolicy login_UserPolicyKeys login_RemoteOwnership security_RootCA security_RuntimeExecStack security_Minijail0 logging_CrashSender security_RootfsStatefulSymlinks platform_CUPSDaemon kernel_CryptoAPI login_OwnershipNotRetaken login_SameSessionTwice login_GuestAndActualSession login_RetrieveActiveSessions This seems to be a chromeos issue that chromeos sheriffs should probably be taking a look at?
,
May 9 2018
What do the asan runs do differently w.r.t. disk space? Does it create large reports?
,
May 9 2018
ASAN creates larger binaries. Also the image is created with certain parameters and VM started to not use too many resources on the server that runs the VM. Presumably some need to be tweaked to make space.
,
May 9 2018
Who is a good owner for this? Presumably someone who's pushing asan support?
,
May 9 2018
Presumably anybody who is interested in finding memory allocation issues on ChromeOS.
,
May 9 2018
Hmm...so it doesn't look like a specific CL breaks this. Can someone from infra team help fix memory allocation?
,
May 9 2018
re#9: someone enabled asan, right? It didn't just magically happen? I don't have enough knowledge of the requirements for ASAN nor what vmtests provide w.r.t. those requirements. It sounds like you do, but aren't sharing?
,
May 11 2018
ihf@ - would you happen to know someone or someone who might know someone who would be knowledgeable about this bot?
,
May 14 2018
,
May 14 2018
+vapier vapier@ can you help triage?
,
May 14 2018
I don't know anything about this, but I've made a change to increase amd64-generic's stateful partition to 3 GB and started a tryjob to see if it helps. I won't be surprised if I run into trouble due to this exceeding the disk image's size or something like that, though.
,
May 14 2018
amd64-generic-asan is unable to build chromeos-chrome now, which blocks this ( issue 842912 ).
,
May 15 2018
My tryjob at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946512024108427328 passed, so I sent https://crrev.com/c/1058132 for review.
,
May 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/53ec8ca21839d122825648da1635a2660acd5c0b commit 53ec8ca21839d122825648da1635a2660acd5c0b Author: Daniel Erat <derat@chromium.org> Date: Tue May 15 06:57:03 2018 overlay-amd64-generic: Bump stateful partition to 3 GB. The amd64-generic-tot-asan-informational builder's VMTest stage has started failing after running out of space while writing to /usr/local/autotest. Increase the stateful partition's size from 2432 MB to 3072 MB to try to avoid this. BUG= chromium:841013 TEST=none Change-Id: If31b4edb758296e7d5595b8c67835af04495612d Reviewed-on: https://chromium-review.googlesource.com/1058132 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/53ec8ca21839d122825648da1635a2660acd5c0b/overlay-amd64-generic/scripts/disk_layout.json
,
May 15 2018
VMTest passed in http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946471934661024800. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by sammiequon@chromium.org
, May 9 2018