mac-rel out of disk space |
|||||
Issue descriptionmac-rel bot is purple: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/mac-rel/2166 ---- zip -yr1 /b/s/w/ir/cache/chrome_staging/pnacl.zip pnacl adding: pnacl/ (stored 0%) adding: pnacl/pnacl/ (stored 0%) adding: pnacl/pnacl/pnacl_public_pnacl_json (deflated 44%) adding: pnacl/pnacl/pnacl_public_x86_64_crtbegin_for_eh_o (deflated 62%) adding: pnacl/pnacl/pnacl_public_x86_64_crtbegin_o (deflated 65%) adding: pnacl/pnacl/pnacl_public_x86_64_crtend_o (deflated 63%) adding: pnacl/pnacl/pnacl_public_x86_64_ld_nexe (deflated 58%) adding: pnacl/pnacl/pnacl_public_x86_64_libcrt_platform_a (deflated 71%) adding: pnacl/pnacl/pnacl_public_x86_64_libgcc_a (deflated 84%) adding: pnacl/pnacl/pnacl_public_x86_64_libpnacl_irt_shim_a (deflated 71%) adding: pnacl/pnacl/pnacl_public_x86_64_libpnacl_irt_shim_dummy_a (deflated 62%) adding: pnacl/pnacl/pnacl_public_x86_64_pnacl_llc_nexe (deflated 58%) adding: pnacl/pnacl/pnacl_public_x86_64_pnacl_sz_nexe zip I/O error: No space left on device zip error: Output file write failure (write error on zip file) Took 0.423719 seconds to create zip. Traceback (most recent call last): File "/b/s/w/ir/kitchen-checkout/build/scripts/slave/chromium/archive_build.py", line 663, in <module> sys.exit(main()) File "/b/s/w/ir/kitchen-checkout/build/scripts/slave/chromium/archive_build.py", line 659, in main return s.ArchiveBuild() File "/b/s/w/ir/kitchen-checkout/build/scripts/slave/chromium/archive_build.py", line 484, in ArchiveBuild [f['filename'] for f in archives_list[archive_name]])[1] File "/b/s/w/ir/kitchen-checkout/build/scripts/slave/chromium/archive_build.py", line 305, in CreateArchiveFile zip_file_list, zip_name) File "/b/s/w/ir/kitchen-checkout/build/scripts/common/archive_utils.py", line 327, in CreateArchive raise StagingError('Failed to make zip package %s' % zip_file) common.archive_utils.StagingError: Failed to make zip package /b/s/w/ir/cache/chrome_staging/pnacl.zip step returned non-zero exit code: 1 ----
,
Dec 17
Oh, that was a bad paste. Where did all the line endings go? I'll look through the CLs and see if any changes could be causing this. e.g. forgetting to strip some binaries.
,
Dec 17
There are no nacl or build changes in the blamelist. Anything interesting on the machine? What's taking up all the space?
,
Dec 17
/.fseventsd has 28GB of files. This is a known infra issue (more in bug 905110). I'll try deleting that directory to see if it helps. There are 51 GB of obj files in the //out/Release directory. Not sure if that's larger than expected or not. Right now the bot is at 63% usage of its disk (145 GB). 70GB accounted for given the two I just listed. Not sure where that is coming from.
,
Dec 17
The current build will likely finish in ~50 minutes. If it fails again, can we stop the bot at that moment, so we can examine where the disk space is going? [speculation] It may not be a nacl issue if something else is larger than normal, eats up most of the disk space, and leaves the nacl archive step to fail.
,
Dec 17
Looking over prior green bot runs, the build artifacts being uploaded are: chrome-mac.zip: 83 MB content-shell.zip: 73 MB browser_tests: 166 MB pnacl.zip: 7 MB remoting-me2me-host-mac.zip: 37 MB Nothing particularly big.
,
Dec 18
The build directory right now is about 80 GB. The bot has 168 GB used. I'm not sure where the other 80 GB is coming from.
,
Dec 18
Does "du -d -1 -h -x /" account for all the disk space usage?
,
Dec 18
Probably. It's very slow to run though.
,
Dec 18
Also, I paused the bot on https://luci-scheduler.appspot.com/jobs/chromium/mac-rel to try to debug the disk space issues.
,
Dec 18
"du -d -h -x /" gave 195 GB, which is about what the bot is using.
,
Dec 18
The compile step is reaching the end, so it should be linking binaries together and possibly using up the remaining 37 GB. The build should finish in about 10 minutes.
,
Dec 18
... and this run went green. Did deleting files from /.fseventsd prevent it from falling over?
,
Dec 18
I'm on the bot right now, looking at disk space. Here are the directory sizes of a few directories: Chromium checkout: 160 GB //out/Release: 144 GB //out/Release/obj: 127 GB Directory usage from "/" 0B .DS_Store 0B .PKInstallSandboxManager 0B .PKInstallSandboxManager-SystemSoftware 12K .Spotlight-V100 0B .Trashes 0B .file 13G .fseventsd 0B .vol 7.0G Applications 1.3G Library 0B Network 5.2G System 215M Users 4.0K Volumes 178G b 2.5M bin 0B cores 44K creds 4.5K dev 4.0K etc 1.0K home 4.0K installer.failurerequests 1.0K net 589M opt 6.1G private 1.2M sbin 4.0K tmp 486M usr 4.0K var 213G total The chromium checkout is 90% of the usage of /b. There's also a 15 GB git cache, and then probably very small amounts of swarming data and stuff. The delete command for /.fsevensd is still running, which should hopefully get us a free 13 GB. Not sure what else we can do though.
,
Dec 18
Re #13: I'd guess so. Looks to have deleted about 15 GB, which, if it was failing to create a 160 MB zip file, would probably help.
,
Dec 18
Resuming the triggering, since the last build was green.
,
Dec 18
Is the delete command for /.fseventsd just a temporary fix? i.e. /.fseventsd will just fill up eventually again. How long until that happens? I have no idea what is "normal" for this bot, but in general, I assume the disk space usage in /b is going to continue to grow. Would the following be a reasonable set of actions? 1) Let the bot resume its builds. 2) Cross our fingers and hope it doesn't run out of disk space again for a few days. 3) Look into giving it more disk space.
,
Dec 18
The /.fseventsd fix should be permanent. We should verify it worked in the future. Supposedly we can stop that directory from growing by dropping a special file in there, which is in there via Puppet now. I think the plan is to wait a few days and see if this happens again, and monitor the disk usage and size of the checkout. 160 GB seems really big to me, but maybe if it doesn't grow it'll be fine. Next trooper, when you have time, can you ssh onto the bot and check if there are files in /.fseventsd, and the size of the chromium checkout. Numbers like what I posted in #14 would be helpful to see how the bot changes over time.
,
Dec 18
I'm going to replicate the build on a Mac locally and see how much space it takes. Then I'll checkout the code from 10000 revisions prior to that and see what the difference is, as a rough estimate of disk growth in the out/ directory. +ellyjones FYI, and to CC others if needed.
,
Dec 18
$ sudo du -s -h -x /.fseventsd /b 374M /.fseventsd 13G /b
,
Dec 18
Stephen, do the numbers that I posted indicate the .fseventd fix worked?
,
Dec 18
13GB seems low for /b.
/.fsevensd shouldn't have any files though. I just ssh-ed onto the bot, and it still has files.
$ ls /.fseventsd | wc -l
6718
$ cat /.fseventsd/no_log
# This file is managed by Puppet.
# https://crbug.com/905110
# http://blog.hostilefork.com/trashes-fseventsd-and-spotlight-v100
This is worrying. Looks like the fix we thought would work isn't.
Good news is the growth rate of the disk size isn't horrible. I'll post this on bug 905110.
,
Dec 18
160GB isn't outrageously big if you do a full build with symbols. 13GB is close to what I'd expect with a public checkout but nothing built. mac-rel is doing clobber builds, so make sure you know at which point you're measuring things :).
,
Dec 18
,
Dec 18
Re #c24 / measuring at the right time: https://viceroy.corp.google.com/chrome_infra/Machines/per_machine?hostname=vm260-m9&duration=1d&refresh=-1 The disk usage graph is... very interesting :-)
,
Today
(13 hours ago)
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/95077deba7d2bdc990d7d76f914f64b63f36cc2e commit 95077deba7d2bdc990d7d76f914f64b63f36cc2e Author: Stephen Martinis <martiniss@google.com> Date: Tue Jan 22 17:58:32 2019 |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by martiniss@chromium.org
, Dec 17