Issue metadata
Sign in to add a comment
|
chromeos4-row11-rack11-host7 provision/repair looping. |
||||||||||||||||||||||||
Issue description
chromeos4-row11-rack11-host7 repeatedly fails provision with:
FAIL provision_AutoUpdate provision_AutoUpdate timestamp=1513377974 localtime=Dec 15 14:46:14 Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row11-rack11-host7: 0) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 1) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',),
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 126, in run_once
with_cheets=with_cheets)
File "/usr/local/autotest/server/afe_utils.py", line 124, in machine_install_and_update_labels
*args, **dargs)
File "/usr/local/autotest/server/hosts/cros_host.py", line 828, in machine_install_by_devserver
quick_provision=quick_provision)
File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2391, in auto_update
error_msg % (host_name, real_error))
DevServerException: CrOS auto-update failed for host chromeos4-row11-rack11-host7: 0) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',), 1) RootfsUpdateError: Failed to perform rootfs update: DevServerStartupError('Timeout (30) waiting for remote devserver port_file',),
The passes all tests in repair.
,
Dec 16 2017
I logged in, did: touch /var/tmp/provision_failed And now I'm forcing a repair.
,
Dec 16 2017
Re c#2 - given the symptoms, I'm going to guess that the initial powerwash repair will fail. If the DUT's servo is working, the DUT will pass by re-installing from USB. If the servo isn't working, the DUT will fail repair. This failure is the same symptom as bug 795396. Overall, the symptom seems too specific not to be a duplicate...
,
Dec 16 2017
USB Repair failed with the very surprising: chroot: failed to run command '/usr/bin/cros_installer': No such file or directory END FAIL ---- repair.usb timestamp=1513383344 localtime=Dec 15 16:15:44 GOOD ---- verify.hwid timestamp=1513383349 localtime=Dec 15 16:15:49
,
Dec 16 2017
I have NOT locked this DUT, but it is now marked bad because of the failed repair.
,
Dec 16 2017
I'm unmarking duplicate; the proximate symptom is the same,
but digging deeper, there are differences:
* chromeos-install to the DUT fails; postinst can't find
cros_installer.
* Looking at the internal storage after install, things seemed
a bit odd. In particular, I think ROOT-A should have had the
target image, but after mounting the file system, the contents
of /etc weren't right; it had what looked like /usr...
There were no error reported against either the USB or the internal
storage, but there was clearly something wrong _somewhere_.
,
Jan 5 2018
Passing to current deputy.
,
Jan 5 2018
The DUT was locked, but also "Repair Failed". So, I've unlocked the DUT and kicked off a repair. We can decide next actions once that produces results.
,
Jan 6 2018
I've seen the DUT through a variety of repair cycles, and
eventually manually forced boot from USB and ran chromeos-install.
The DUT seems to consistently fail when installing from USB with
errors that look like this:
Installing partition 1 to /dev/mmcblk0
Installing the stateful partition...
/usr/sbin/chromeos-install: 746: /usr/sbin/chromeos-install: /tmp/install-mount-point/postinst: not found
After that happened, I logged into the DUT and took a peek in the
temporary mount point:
localhost ~ # ls /tmp/install-mount-point/
ls: cannot access '/tmp/install-mount-point/etc': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/var': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/opt': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/mnt': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/debugd': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/sbin': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/lib': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/lost+found': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/proc': Structure needs cleaning
ls: cannot access '/tmp/install-mount-point/run': Structure needs cleaning
bin build dev home lib64 media opt proc run sys usr
boot debugd etc lib lost+found mnt postinst root sbin tmp var
Just to get a better feel, I also tried this:
localhost ~ # ls -l /tmp/install-mount-point/postinst
lrwxrwxrwx. 1 32 root 26 Dec 14 11:59 /tmp/install-mount-point/postinst -> usr/sbinochromeos-postilst
So, whatever is wrong, the end result is a corrupted file system
on the target disk before postinst could run.
/var/log/messages doesn't show any error complaints about the internal
storage device, so I've no idea what went wrong. However, a spot-check
against a different cyan DUT with the same procedure says that
chromeos-install works, so the problem seems to be specific to this one
DUT.
,
Jan 6 2018
The problem just _has_ to be failing FPROM storage:
$ echo usr/sbinochromeos-postilst | od -b
0000000 165 163 162 057 163 142 151 156 157 143 150 162 157 155 145 157
0000020 163 055 160 157 163 164 151 154 163 164 012
0000033
$ echo usr/sbin/chromeos-postinst | od -b
0000000 165 163 162 057 163 142 151 156 057 143 150 162 157 155 145 157
0000020 163 055 160 157 163 164 151 156 163 164 012
0000033
There are two stuck-bit errors where a '1' is stored in place
of a '0':
* At octal offset 010, the value is 157 in place of 057.
* At octal offset 027, the value is 156 in place of 154.
,
Jan 6 2018
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by dgarr...@chromium.org
, Dec 15 2017