New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 777923 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 712682
issue 768542
issue 777920



Sign in to add a comment

Surface exactly when SSH connection failed during provision

Project Member Reported by pprabhu@chromium.org, Oct 24 2017

Issue description

We reboot the DUT a few times during provision, and we often fail to SSH into the DUT after such a reboot. Exactly when this SSH failure happens is critical -- if it happened before any update, it is likely to be an infra issue. If it happened after rootfs update, it is more likely to be a problem with the image.

Currently, all these failures look identical in status.log:
  Traceback (most recent call last):
    File "/usr/local/autotest/client/common_lib/test.py", line 806, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 470, in execute
      dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 380, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 121, in run_once
      with_cheets=with_cheets)
    File "/usr/local/autotest/server/afe_utils.py", line 124, in machine_install_and_update_labels
      *args, **dargs)
    File "/usr/local/autotest/server/hosts/cros_host.py", line 815, in machine_install_by_devserver
      force_original=force_original)
    File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2355, in auto_update
      error_msg % (host_name, real_error))
  DevServerException: CrOS auto-update failed for host chromeos2-row7-rack6-host19: 0) SSHConnectionError: ssh: connect to host chromeos2-row7-rack6-host19 port 22: Connection timed out
  , 1) SSHConnectionError: ssh: connect to host 100.115.230.65 port 22: Connection timed out


Insert a failure message saying exactly when the SSH failed so that it is easier to classify these failure types.
 
Blocking: 777920 768542
Cc: jrbarnette@chromium.org xixuan@chromium.org
Blocking: 712682
Labels: Chase-Pending
Justification: 

Scope is correct for Chase -- go find all the places where SSH can timeout when in the actual provision code, and create new exceptions that tell us what happened to the DUT and when.

Impact: As the deputy this week, I spent 75%+ of my time looking at provision failures. This would have cut this time by at least 25%.
Also, this will surface the correct error all the way back to the builders, giving users more than "DevserverException", and "SSH timed out" -- both of which they ignore and do not know where to dig deeper.

So yes, this is about reporting; but I claim that this reporting is central to being able to deputy.

Labels: -Chase-Pending Chase
Owner: xixuan@chromium.org
CL in flight.
Project Member

Comment 7 by bugdroid1@chromium.org, Nov 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/24b04ea4a0891030da74cbd3bd17d1d644a1ff74

commit 24b04ea4a0891030da74cbd3bd17d1d644a1ff74
Author: Xixuan Wu <xixuan@chromium.org>
Date: Tue Nov 07 01:26:21 2017

auto_updater: Add error logging for different reboots.

BUG= chromium:777923 
TEST=cros flash & ds.auto_update()

Change-Id: I427c58a63b601e476dc0703d041c53e0ad922569
Reviewed-on: https://chromium-review.googlesource.com/753535
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: David Haddock <dhaddock@chromium.org>

[modify] https://crrev.com/24b04ea4a0891030da74cbd3bd17d1d644a1ff74/lib/remote_access.py
[modify] https://crrev.com/24b04ea4a0891030da74cbd3bd17d1d644a1ff74/lib/auto_updater.py

Status: Fixed (was: Untriaged)

Sign in to add a comment