In bug #675646 , we are postulating that the build slave was out of disk space.
Ideally, something in the logs should have made this more obvious.
Specifically, the error that we saw was:
===
@@@STEP_FAILURE@@@
06:40:28: ERROR: <type 'exceptions.IOError'>: [Errno 28] No space left on device
Traceback (most recent call last):
File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 602, in TaskRunner
task(*x, **task_kwargs)
File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 800, in <lambda>
fn = lambda idx, task_args: out_queue.put((idx, task(*task_args)))
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_build_lib.py", line 268, in _GenerateSinglePayload
dry_run=dry_run)
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_payload_lib.py", line 837, in CreateAndUploadPayload
dry_run=dry_run).Run()
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_payload_lib.py", line 707, in Run
self._drm(self._VerifyPayload)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 45, in __call__
return self.Run(func, *args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 82, in Run
return self._Call(func, *args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/dryrun_lib.py", line 86, in _Call
return func(*args, **kwargs)
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_payload_lib.py", line 681, in _VerifyPayload
self._ApplyPayload(payload, is_delta)
File "/b/cbuild/internal_master/chromite/lib/paygen/paygen_payload_lib.py", line 641, in _ApplyPayload
payload.Apply(bspatch_path=bspatch_path, **part_files)
File "/b/cbuild/internal_master/src/platform/dev/host/lib/update_payload/payload.py", line 321, in Apply
old_rootfs_part=old_rootfs_part)
File "/b/cbuild/internal_master/src/platform/dev/host/lib/update_payload/applier.py", line 569, in Run
self.payload.manifest.old_rootfs_info)
File "/b/cbuild/internal_master/src/platform/dev/host/lib/update_payload/applier.py", line 517, in _ApplyToPartition
new_part_file, new_part_info.size)
File "/b/cbuild/internal_master/src/platform/dev/host/lib/update_payload/applier.py", line 458, in _ApplyOperations
self._ApplyReplaceOperation(op, op_name, data, new_part_file, part_size)
File "/b/cbuild/internal_master/src/platform/dev/host/lib/update_payload/applier.py", line 269, in _ApplyReplaceOperation
part_file.write(out_data[data_start:data_end])
IOError: [Errno 28] No space left on device
<type 'exceptions.IOError'>: [Errno 28] No space left on device
===
With this error, it's unclear to the oblivious sheriff / deputy / trooper if the error was that we ran out of space for something Chrome OS related (like we blew out the image size) or if we ran out of space on the build slave.
===
A proposal is to catch errors _somewhere_ in that call stack (maybe just catch IOErrors?) and then print the output of "df -h" to the logs. Plausibly you could even look for "100%" somewhere in the text and print a hint that the build slave might be out of space.
===
See bug #676152 for an example of a different script that does similar. In that case it's a bash script that catches things, but python should be able to catch exceptions too.
Comment 1 by semenzato@chromium.org
, Dec 22 2016