New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 868141 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Jul 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: ----



Sign in to add a comment

reef DUTs are experiencing TIMED_OUT in repair

Project Member Reported by xixuan@chromium.org, Jul 26

Issue description

https://chrome-swarming.appspot.com/bot?id=chromeos-skylab-bot-0d4bd243-de01-4d33-8d42-c8a795cf7e0e&sort_stats=total%3Adesc
https://chrome-swarming.appspot.com/bot?id=chromeos-skylab-bot-95770639-a879-473d-8509-22e453bc5596&sort_stats=total%3Adesc
https://chrome-swarming.appspot.com/bot?id=chromeos-skylab-bot-82709c0b-67a9-4d15-9e9b-581c5ef7510a&sort_stats=total%3Adesc

It may be different from  Issue 865171  due to the following tasks I checked:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/swarming-3eec8488ccbada11/debug/
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/swarming-3ef09dbe7817af11/

Seems rsync/scp has problems, like:

07/25 16:11:42.484 DEBUG|          ssh_host:0301| Running (ssh) '/usr/local/autotest/result_tools/utils.py -p /var/log -m 20000' from 'parallel_simple|repair|collect_logs|run_on_client|run|run_very_slowly'
07/25 16:11:42.754 ERROR|             utils:0286| [stderr] bash: /usr/local/autotest/result_tools/utils.py: No such file or directory
07/25 16:11:42.828 ERROR|            runner:0121| Non-critical failure: Failed to create directory summary for /var/log.
Traceback (most recent call last):
  File "/usr/local/autotest/client/bin/result_tools/runner.py", line 114, in run_on_client
    timeout=_BUILD_DIR_SUMMARY_TIMEOUT)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run
    return self.run_very_slowly(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly
    ssh_failure_retry_ok)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 262, in _run
    raise error.AutoservRunError("command execution error", result)
AutoservRunError: command execution error
 
or

07/26 11:20:59.789 DEBUG|             utils:0218| Running 'rsync -l --safe-links  --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_AHHxzlssh-master/socket -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22' -az --no-o --no-g  root@chromeos6-row3-rack12-host15:"/var/log" "/usr/local/autotest/results/swarming-3ef09dbe7817af11/chromeos6-row3-rack12-host15/before_repair"'
07/26 11:36:25.492 WARNI|      abstract_ssh:0448| rsync status 255, retrying
 
Cc: shu...@chromium.org
Here I see reef failing in stage SkylabHWTest

https://luci-logdog.appspot.com/v/?s=chromeos/bb/chromeos/reef-paladin/6419/+/recipes/steps/SkylabHWTest__provision___reef_/0/stdout

The master paladin is failing, probably due to this.


Or see for example

https://luci-logdog.appspot.com/v/?s=chromeos/bb/chromeos/reef-paladin/6423/+/recipes/steps/SkylabHWTest__provision___reef_/0/stdout

2018-07-26 15:06:06,192 INFO | Found 0 successfully provisioned duts, the minimum requirement is 1



Re #1, reef is in skylab experiment and is marked as non-important, so it won't affect master-paladin.
 Issue 868335  has been merged into this issue.
Owner: xixuan@chromium.org
Status: WontFix (was: Untriaged)

Sign in to add a comment