New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 795456 link

Starred by 1 user

Issue metadata

Status: Fixed
Merged: issue 795396
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

chromeos4-row11-rack11-host7 provision/repair looping.

Project Member Reported by dgarr...@chromium.org, Dec 15 2017

Issue description

chromeos4-row11-rack11-host7 repeatedly fails provision with:

FAIL	provision_AutoUpdate	provision_AutoUpdate	timestamp=1513377974	localtime=Dec 15 14:46:14	Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row11-rack11-host7: 0) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 1) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 
  Traceback (most recent call last):
    File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
      dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 126, in run_once
      with_cheets=with_cheets)
    File "/usr/local/autotest/server/afe_utils.py", line 124, in machine_install_and_update_labels
      *args, **dargs)
    File "/usr/local/autotest/server/hosts/cros_host.py", line 828, in machine_install_by_devserver
      quick_provision=quick_provision)
    File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2391, in auto_update
      error_msg % (host_name, real_error))
  DevServerException: CrOS auto-update failed for host chromeos4-row11-rack11-host7: 0) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 1) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 


The passes all tests in repair.
 
I logged in, did:

touch /var/tmp/provision_failed

And now I'm forcing a repair.
Mergedinto: 795396
Status: Duplicate (was: Untriaged)
Re c#2 - given the symptoms, I'm going to guess that the initial
powerwash repair will fail.  If the DUT's servo is working, the
DUT will pass by re-installing from USB.  If the servo isn't
working, the DUT will fail repair.

This failure is the same symptom as bug 795396.  Overall, the
symptom seems too specific not to be a duplicate...
USB Repair failed with the very surprising:

  chroot: failed to run command '/usr/bin/cros_installer': No such file or directory
	END FAIL	----	repair.usb	timestamp=1513383344	localtime=Dec 15 16:15:44	
	GOOD	----	verify.hwid	timestamp=1513383349	localtime=Dec 15 16:15:49	

I have NOT locked this DUT, but it is now marked bad because of the failed repair.
Status: Assigned (was: Duplicate)
I'm unmarking duplicate; the proximate symptom is the same,
but digging deeper, there are differences:
  * chromeos-install to the DUT fails; postinst can't find
    cros_installer.
  * Looking at the internal storage after install, things seemed
    a bit odd.  In particular, I think ROOT-A should have had the
    target image, but after mounting the file system, the contents
    of /etc weren't right; it had what looked like /usr...

There were no error reported against either the USB or the internal
storage, but there was clearly something wrong _somewhere_.

Owner: ayatane@chromium.org
Passing to current deputy.
The DUT was locked, but also "Repair Failed".  So, I've unlocked the DUT
and kicked off a repair.  We can decide next actions once that produces
results.

I've seen the DUT through a variety of repair cycles, and
eventually manually forced boot from USB and ran chromeos-install.
The DUT seems to consistently fail when installing from USB with
errors that look like this:
    Installing partition 1 to /dev/mmcblk0
    Installing the stateful partition...
    /usr/sbin/chromeos-install: 746: /usr/sbin/chromeos-install: /tmp/install-mount-point/postinst: not found

After that happened, I logged into the DUT and took a peek in the
temporary mount point:
    localhost ~ # ls /tmp/install-mount-point/
    ls: cannot access '/tmp/install-mount-point/etc': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/var': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/opt': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/mnt': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/debugd': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/sbin': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/lib': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/lost+found': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/proc': Structure needs cleaning
    ls: cannot access '/tmp/install-mount-point/run': Structure needs cleaning
    bin   build   dev  home  lib64       media  opt       proc  run   sys  usr
    boot  debugd  etc  lib   lost+found  mnt    postinst  root  sbin  tmp  var

Just to get a better feel, I also tried this:
    localhost ~ # ls -l /tmp/install-mount-point/postinst
    lrwxrwxrwx. 1 32 root 26 Dec 14 11:59 /tmp/install-mount-point/postinst -> usr/sbinochromeos-postilst

So, whatever is wrong, the end result is a corrupted file system
on the target disk before postinst could run.

/var/log/messages doesn't show any error complaints about the internal
storage device, so I've no idea what went wrong.  However, a spot-check
against a different cyan DUT with the same procedure says that
chromeos-install works, so the problem seems to be specific to this one
DUT.

The problem just _has_ to be failing FPROM storage:
    $ echo usr/sbinochromeos-postilst | od -b
    0000000 165 163 162 057 163 142 151 156 157 143 150 162 157 155 145 157
    0000020 163 055 160 157 163 164 151 154 163 164 012
    0000033
    $ echo usr/sbin/chromeos-postinst | od -b 
    0000000 165 163 162 057 163 142 151 156 057 143 150 162 157 155 145 157
    0000020 163 055 160 157 163 164 151 156 163 164 012
    0000033

There are two stuck-bit errors where a '1' is stored in place
of a '0':
  * At octal offset 010, the value is 157 in place of 057.
  * At octal offset 027, the value is 156 in place of 154.

Status: Fixed (was: Assigned)
Filed b/71650127 to have the unit replaced.

Sign in to add a comment