New issue
Advanced search Search tips

Issue 798823 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: May 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

push_to_prod failing in provision with ENOSPC

Reported by jrbarnette@chromium.org, Jan 3 2018

Issue description

Push to prod testing has been failing for a week or more.
The most recent logs show this (among other problems):

[chromeos-staging-master2.hot.corp.google.com] out: provision                     FAIL
...
[chromeos-staging-master2.hot.corp.google.com] out: 1 test(s) are not expected to be run:
[chromeos-staging-master2.hot.corp.google.com] out: provision

It takes quite a lot of digging, but if you look hard enough,
you can find this suite job:
    https://ubercautotest-staging.corp.google.com/afe/#tab_id=view_job&object_id=8346
which includes this failure:
    https://ubercautotest-staging.corp.google.com/afe/#tab_id=view_job&object_id=8348
From which, if you know the magic, you can find these logs:
    https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/8348-chromeos-test/chromeos2-row1-rack2-host13/

In the logs of the failed provision task, you can find this snippet:
2018/01/03 09:07:31.875 DEBUG|    cros_build_lib:0593| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpV5eCHW/testing_rsa root@chromeos2-row1-rack2-host13 -- sh /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.iVhhCFnKHj/stateful_update /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.iVhhCFnKHj/stateful.tgz '--stateful_change=clean'
Warning: Permanently added 'chromeos2-row1-rack2-host13,100.115.245.231' (ED25519) to the list of known hosts.
Warning: Permanently added 'chromeos2-row1-rack2-host13,100.115.245.231' (ED25519) to the list of known hosts.
Reading local payload /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.iVhhCFnKHj/stateful.tgz
tar: dev_image_new/telemetry/src/data/page_cycler/moz/www.nytimes.com_Table/index.html: Wrote only 8192 of 10240 bytes
tar: dev_image_new/telemetry/src/data/page_cycler/moz/espn.go.com: Cannot mkdir: No space left on device
tar: dev_image_new/telemetry/src/data/page_cycler/moz/espn.go.com: Cannot mkdir: No space left on device

The space problem persists later on, too.

 
Logging in to the DUT, there's no obvious space problem:
    localhost ~ # df -m /mnt/stateful_partition/
    Filesystem     1M-blocks  Used Available Use% Mounted on
    /dev/mmcblk0p1     10549  1767      8228  18% /mnt/stateful_partition

That's the state after the failure, without any re-imaging taking
place.  You can still see leftovers from the failure(s):
    ls /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.8uSmEnmj4x    
    src  stateful.tgz  stateful_update

Labels: -Pri-2 Pri-3
ITOT push to prod _hasn't_ been "failing for a week or more".
It's not even clear whether this incident represents an actual
failure, or just a red herring.

Holding this open for a bit longer, but there's a good chance
I'll just throw my hands and say "WontFix".
Status: Archived (was: Untriaged)

Sign in to add a comment