New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 898586 link

Starred by 1 user

Issue metadata

Status: Closed
Owner:
Closed: Oct 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Jumbo, ios-simulator, mac dbg builders on waterfall are out of space.

Project Member Reported by dalecur...@chromium.org, Oct 24

Issue description

FAILED: headless_browsertests 
python "../../build/toolchain/gcc_link_wrapper.py" --output="./headless_browsertests" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -m64 -Werror -Wl,-O2 -Wl,--gc-sections -rdynamic -nostdlib++ --sysroot=../../build/linux/debian_sid_amd64-sysroot -L../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=. -Wl,--disable-new-dtags -o "./headless_browsertests" -Wl,--start-group @"./headless_browsertests.rsp"  -Wl,--end-group   -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -lresolv -lgio-2.0 -lexpat -luuid -ldbus-1 -lXext -lX11 -lXcomposite -lXrender -lm -lX11-xcb -lxcb -lXcursor -lXdamage -lXfixes -lXi -lXtst -lXss -lXrandr -lasound -lz -lpangocairo-1.0 -lpango-1.0 -lcairo -lpci -latk-1.0 -latk-bridge-2.0 -latspi -lcups 
ld.lld: error: failed to open ./headless_browsertests: No space left on device
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[49909/52113] LINK ./media_blink_unittests
FAILED: media_blink_unittests 
python "../../build/toolchain/gcc_link_wrapper.py" --output="./media_blink_unittests" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -m64 -Werror -Wl,-O2 -Wl,--gc-sections -rdynamic -nostdlib++ --sysroot=../../build/linux/debian_sid_amd64-sysroot -L../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=. -Wl,--disable-new-dtags -o "./media_blink_unittests" -Wl,--start-group @"./media_blink_unittests.rsp"  -Wl,--end-group   -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -lexpat -luuid -lX11 -lX11-xcb -lxcb -lXcomposite -lXcursor -lXdamage -lXext -lXfixes -lXi -lXrender -lXtst -lXrandr -lresolv -lgio-2.0 -lpci -lXss -lasound -lm -lz -lpangocairo-1.0 -lpango-1.0 -lcairo -ldbus-1 
ld.lld: error: failed to open ./media_blink_unittests: No space left on device
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[49910/52113] ACTION //extensions/shell/installer/linux:app_shell_unstable_deb(//build/toolchain/linux:clang_x64)
FAILED: chromium-app-shell-unstable_72.0.3591.0-1_amd64.deb 
python ../../build/gn_run_binary.py app_shell_installer/debian/build.sh -a x64 -b . -c unstable -d chromium -o . -s ../../build/linux/debian_sid_amd64-sysroot
install: error writing '/b/swarming/w/ir/cache/builder/src/out/Release/app-shell-deb-staging-unstable//opt/chromium.org/app-shell-unstable/app_shell': No space left on device
install: failed to extend '/b/swarming/w/ir/cache/builder/src/out/Release/app-shell-deb-staging-unstable//opt/chromium.org/app-shell-unstable/app_shell': No space left on device
app_shell_installer/debian/build.sh failed with exit code 1
[49911/52113] LINK ./services_unittests
FAILED: services_unittests 
python "../../build/toolchain/gcc_link_wrapper.py" --output="./services_unittests" -- ../../third_party/llvm-build/Release+Asserts/bin/clang++ -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -m64 -Werror -Wl,-O2 -Wl,--gc-sections -rdynamic -nostdlib++ --sysroot=../../build/linux/debian_sid_amd64-sysroot -L../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/local/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/lib/x86_64-linux-gnu -L../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=../../build/linux/debian_sid_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=. -Wl,--disable-new-dtags -o "./services_unittests" -Wl,--start-group @"./services_unittests.rsp"  -Wl,--end-group   -ldl -lpthread -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -lX11 -lX11-xcb -lxcb -lXcomposite -lXcursor -lXdamage -lXext -lXfixes -lXi -lXrender -lXtst -lexpat -luuid -lresolv -lgio-2.0 -lm -lXss -lXrandr -latk-1.0 -latk-bridge-2.0 -lpangocairo-1.0 -lpango-1.0 -lcairo -lpci -lasound -lz -ldbus-1 
ld.lld: error: failed to open ./services_unittests: No space left on device
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
step returned non-zero exit code: 1

https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931764273626281984/+/steps/compile/0/stdout
 
Components: Infra
Actually this may be more than Jumbo builder, ios-simulator just indicated out of space too.

thon ../../build/toolchain/mac/linker_driver.py xcrun lipo -create -output obj/ios/chrome/test/earl_grey/ios_chrome_smoke_egtests obj/ios/chrome/test/earl_grey/x64/ios_chrome_smoke_egtests ios_clang_x86/obj/ios/chrome/test/earl_grey/x86/ios_chrome_smoke_egtests
fatal error: /b/s/w/ir/cache/xcode_ios_10a254a.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/lipo: can't write to output file: obj/ios/chrome/test/earl_grey/ios_chrome_smoke_egtests.lipo (No space left on device)
Traceback (most recent call last):
  File "../../build/toolchain/mac/linker_driver.py", line 229, in <module>
    Main(sys.argv)
  File "../../build/toolchain/mac/linker_driver.py", line 79, in Main
    subprocess.check_call(compiler_driver_args)
  File "/b/s/w/ir/cipd_bin_packages/lib/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['xcrun', 'lipo', '-create', '-output', 'obj/ios/chrome/test/earl_grey/ios_chrome_smoke_egtests', 'obj/ios/chrome/test/earl_grey/x64/ios_chrome_smoke_egtests', 'ios_clang_x86/obj/ios/chrome/test/earl_grey/x86/ios_chrome_smoke_egtests']' returned non-zero exit status 1

Cc: gbeaty@chromium.org jbudorick@chromium.org
Status: Untriaged (was: Unconfirmed)
Summary: Jumbo and ios-simulator builders on waterfall are out of space. (was: Jumbo builder on waterfall is out of space.)
Summary: Jumbo, ios-simulator, mac dbg builders on waterfall are out of space. (was: Jumbo and ios-simulator builders on waterfall are out of space. )
Labels: -Pri-1 Pri-0
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
=>sergeyberezin who is looking at this while jbudorick@ gets back to his desk.
Will take the bug for now.

Looking at the disk usage patterns across the fleet, it appears to be affecting only Mac, and primarily swarming bots in golo for 10.12.5 and 10.13.* versions (which is where we run most tests). The hardest hit seems 10.13.3, as the "max" disk space hits 100%: http://shortn/_pknNAce3PN (unlike all the other versions, which still have some room).
The hardest hit 20 bots: http://shortn/_YNZtcPE2IY
Let's see what they serve...
BTW, it all seems to have started around 10:30am PDT.
ios-simulator:

vm138-m9
vm155-m9
vm63-m9
vm143-m9
... All of the 100% out of disk bots seem to be ios-simulator.
At least Jumbo builder healed itself, it seems: http://shortn/_Cwh9S4un3w (swarming probably cleared some caches once it hit 100% disk usage).
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/ios-simulator/121499 is the first ios-simulator build that I see failing with out of disk space error, triggered at 2018-10-24 10:29 AM (PDT).
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20Tests%20x64%20%28dbg%29/3932 mentioned earlier fails in an isolated swarming task, not on the main machine, so I'd say it's different.
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/mac-dbg/1349 is related, triggered 2018-10-24 11:12 AM (PDT). But like Jumbo, it healed itself: http://shortn/_Mcf0xChXfd .

So I think we're back to ios-simulator for now, which apparently has smaller disk space (because these are Mac VMs with only 250GB), and has no room to auto-heal.
PS. https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/mac-dbg/1348 (the previous build) started 2018-10-24 10:17 AM (PDT) and worked just fine. So the timing is really within these 13 or so min.
List of CLs that landed right before 10:30am PDT:

5ff40c0e4cb0 2018-10-24 17:27:17 +0000 [run_web_tests] Check for extra baselines
319faa6f9fc6 2018-10-24 17:21:44 +0000 Roll WebGL 6d2f3f4..0d55c88
a4ae44e5446e 2018-10-24 17:13:23 +0000 gfx:: Convert blit_unittest to the new shared memory API
aec5d4fc49d5 2018-10-24 17:12:48 +0000 Enable QUIC connection migration tests for QUIC v99. * Make the QuicPacketCreator use an explicitly passed in   QuicRandom * Make PATH_CHALLENGE and PATH_RESPONSE frames instigate acks.
41a1f2475c30 2018-10-24 17:07:57 +0000 Update V8 to version 7.2.101.
5f13cb27e8f7 2018-10-24 17:07:13 +0000 [Autofill Assistant] Server Payload is saved between requests.

One of these must be the culprit. Something that may affect all Macs, and possibly other platforms, and add a disk usage at compile time.
Most suspected:
https://crrev.com/c/1297570 Roll WebGL 6d2f3f4..0d55c88               17:21:44
https://crrev.com/c/1286894 [run_web_tests] Check for extra baselines 17:27:17

It appears most bots that failed due to disk space already erased their caches: http://shortn/_YkkyIUMvpV

So it's likely not easy to see what exactly is taking up space :(
Cc: kbr@chromium.org
For the record, I doubt that my WebGL conformance roll caused this issue. The number of changes in the roll https://crrev.com/c/1297570 was small.

FYI, here's a link to the current *max* disk space across bots by OS and OS version: http://shortn/_nagzJhBPte

This will be useful to evaluate if a revert worked.
Adding here FTR: it's also possible that something in build.git or other places got added around that time and contributed to the overall disk space. E.g. we install a number of things though gclient runhooks as well as various infra pre-task stages. Things to look for are various SDKs, for example. 
Labels: -Pri-0 Pri-2
Win tests issue above was due to https://chromium.googlesource.com/chromium/src/+/1c6f831f14152cf4ca9e23563757d61524487234 which unfortunately triggered a bunch of flaky tests.

Otherwise things look like they're recovering?
Status: Closed (was: Assigned)
Indeed, I don't see disk usage to be out of ordinary.

Both https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ios-simulator and 
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/ios-simulator have recovered.

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/ios-simulator/121566 [2018-10-24 11:19 AM (PDT)] is the last CQ build with a clear out of disk space error; and https://ci.chromium.org/p/chromium/builders/luci.chromium.try/ios-simulator/121629 [2018-10-24 12:08 PM (PDT)] may be another one (less clear error). 

This specific outage is over, and I filed issue 898686 to track the high disk usage problem to reduce a chance of this happening again.
Both builds in #25 indeed shows a disk space problem on their respective machines, and auto-healing: 
http://shortn/_bvDFdTj9OG

Sign in to add a comment