chromite not updated on some builders? |
||||||
Issue descriptionI found a log where the chromite code is still using the flawed REBOOT_MARKER method to check for reboot. I fixed this about a month ago (issue 667541): is it possible that this builder is still using the old code? https://uberchromegw.corp.google.com/i/chromeos/builders/edgar-release/builds/734/steps/HWTest%20%5Bsanity%5D/logs/stdio https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/94501594-chromeos-test/chromeos4-row12-rack8-host3/debug/ autoserv.DEBUG: 01/05 05:33:42.884 DEBUG| dev_server:1723| Current CrOS auto-update status: pre-setup rootfs update 01/05 05:33:52.932 DEBUG| base_utils:0185| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/get_au_status?build_name=edgar-release/R57-9153.0.0&force_update=True&pid=30450&host_name=chromeos4-row12-rack8-host3&full_update=False"'' 01/05 05:33:54.123 DEBUG| dev_server:1785| Failed to trigger auto-update process on devserver 01/05 05:33:54.123 DEBUG| base_utils:0185| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/handler_cleanup?pid=30450&host_name=chromeos4-row12-rack8-host3"'' 01/05 05:33:55.314 DEBUG| base_utils:0185| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/collect_cros_au_log?pid=30450&host_name=chromeos4-row12-rack8-host3"'' 01/05 05:33:56.543 DEBUG| dev_server:1617| Saving auto-update logs into /usr/local/autotest/results/hosts/chromeos4-row12-rack8-host3/534169-provision/20170501052348/autoupdate_logs/CrOS_update_chromeos4-row12-rack8-host3_30450.log 01/05 05:33:56.544 DEBUG| dev_server:1884| Exception raised on auto_update attempt #1: Traceback (most recent call last): File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 222, in TriggerAU self._RootfsUpdate(chromeos_AU) File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 149, in _RootfsUpdate cros_updater.PreSetupRootfsUpdate() File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 904, in PreSetupRootfsUpdate self.device.Reboot(timeout_sec=self.REBOOT_TIMEOUT) File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 817, in Reboot return self.GetAgent().RemoteReboot(timeout_sec=timeout_sec) File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 380, in RemoteReboot self.RemoteSh('touch %s && reboot' % REBOOT_MARKER) File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 340, in RemoteSh raise SSHConnectionError(e.result.error) SSHConnectionError: Warning: Permanently added 'chromeos4-row12-rack8-host3,100.115.203.51' (RSA) to the list of known hosts. Write failed: Broken pipe
,
Jan 5 2017
https://crrev.com/1bc79b267dd32dc69a0f6f4dc873de333841a50c Also available in the linked bug at #11.
,
Jan 5 2017
I cannot log into any servers. Due to log https://storage.cloud.google.com/chromeos-autotest-results/94501594-chromeos-test/chromeos4-row12-rack8-host3/debug/autoserv.DEBUG?_ga=1.21001784.2018250139.1482890129, It's devserver 100.115.219.134 to execute this failed provision. Deputy could check this devserver to see whether it's updated. If it's not (probably so), deputy could update all devservers with the newest chromite. So reassign back to deputy :)
,
Jan 5 2017
According to Kevin's latest push_to_prod, the change should have been pushed. chromite: git log --oneline c2e9734..c559824 c559824 Use explicit virtualenv in virtualenv_wrapper xixuan@, please paste the right steps to check the chromite version on devserver? I can help with checking.
,
Jan 5 2017
Push-to-prod won't update devserver I think, only update drone&shard. Log into the devserver and check the chromite directory, I don't remember where it exactly exists, but it should be at some very obvious path like ~/chromiumos/chromite. Check chromite/lib/remote_access.py, if it's not like the one in https://chromium-review.googlesource.com/#/c/413632/ and still has 'touch /tmp/awaiting_reboot && reboot' in it, this devserver's chromite is not updated well.
,
Jan 5 2017
ok, I'll check.
,
Jan 5 2017
The devserver chromite is behind the commit#, will update the devservers.
,
Jan 5 2017
,
Jan 9 2017
Updated the devservers, expect the following ones. "description": "Failed to update following devservers ['100.115.24.253', '172.25.65.106', '172.27.215.248', '172.27.215.252']", shuqianz@, please advice how to update individual devservers.
,
Jan 9 2017
The feature to support update individual devservers hasn't been added yet. CL https://chrome-internal-review.googlesource.com/#/c/310302/ is under review. For now, what you can do is to first debug why these devservers were fail to update, fix that and kick off another devserver update, which will update all devservers again. Most of the time, a given devserver failed to update because it was offline.
,
Jan 11 2017
Pass to current deputy, '100.115.24.253', '172.27.215.248', '172.27.215.252' are not ssh-able. '172.25.65.106' failed to update because the virtualenv didn't setup properly on this server, and it fail to update virtualenv.
Here is the error log of '172.25.65.106':
[172.25.65.106] run: git stash
[172.25.65.106] out: /bin/bash: line 0: cd: /usr/local/google/chromeos/infra_virtualenv: No such file or directory
[172.25.65.106] out:
Fatal error: run() received nonzero return code 1 while executing!
Requested: git stash
Executed: /bin/bash -l -c "cd /usr/local/google/chromeos/infra_virtualenv >/dev/null && git stash"
Aborting.
ERROR:root:Traceback (most recent call last):
File "/usr/local/google/home/shuqianz/chromiumos/chromeos-admin/server-management-lib/server_management_lib/tasks/atomic_common.py", line 113, in decorated_func
func(self)
File "/usr/local/google/home/shuqianz/chromiumos/chromeos-admin/server-management-lib/server_management_lib/tasks/atomic_devserver.py", line 163, in run
self._update_all_devservers()
File "/usr/local/google/home/shuqianz/chromiumos/chromeos-admin/server-management-lib/server_management_lib/tasks/atomic_devserver.py", line 157, in _update_all_devservers
'Failed to update following devservers %s' % fail_devservers)
TaskRunFailure: Failed to update following devservers ['172.25.65.106']
ERROR:root:Failed to update following devservers ['172.25.65.106']
INFO:root:Printing out task report.
{
"sub_reports": [],
"exception": "TaskRunFailure(\"Failed to update following devservers ['172.25.65.106']\",)",
"is_successful": false,
"description": "Failed to update following devservers ['172.25.65.106']",
"arguments_used": {
"update_devserver_list": [
"172.25.65.106"
]
},
"task_name": "DevserverPushTask"
cc Allen to take a look of this virualenv problem.
,
Jan 11 2017
172.27.215.248 is the only machine from above which is still offline. I'll follow up on why.
,
Jan 11 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/6641de3fec1d6ce188683d8293b30737d1d043f2 commit 6641de3fec1d6ce188683d8293b30737d1d043f2 Author: Allen Li <ayatane@chromium.org> Date: Wed Jan 11 20:59:32 2017
,
Jan 11 2017
I filed b/34225314 to cover the machine that is still down.
,
Apr 13 2017
I'm going to assume that these fixes have been properly pushed by now. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by nxia@chromium.org
, Jan 5 2017