MobLab Provision with the new flow doesn't work |
|||
Issue descriptionhttps://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row1-rack8-host1/24091-repair/20162309114230/debug/ Is it just me or is it failing the verify checks and not actually doing a flashing of the device to stable version. It has the stable version label already on this DUT, maybe thats why?
,
Sep 23 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c1dd81f370cd7dfdaac9c649b67b9dcc84a52c9c commit c1dd81f370cd7dfdaac9c649b67b9dcc84a52c9c Author: Richard Barnette <jrbarnette@google.com> Date: Fri Sep 23 22:38:40 2016
,
Sep 23 2016
I kicked off a repair, fixed the host, and it began testing again! http://cautotest.corp.google.com/afe/#tab_id=view_host&object_id=1401 Xixuan, your new provision flow has a problem with MobLab.
,
Sep 29 2016
After a mysterious debugging route, the problem happens because I delete the "old autotest directory" at the final of provisioning. And, I don't pass a right "old autotest directory" for this deletion. Previous provision code autoupdater.py also contains this part. The reason that this problem doesn't happen is in that case, the "old autotest directory" is an attribute of a moblab host, and it is a temporary folder. The solution is to delete "delete the old autotest directory" code, since it's not needed in provisioning.
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa commit f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa Author: xixuan <xixuan@chromium.org> Date: Thu Sep 29 21:18:21 2016 auto_updater: Don't delete autotest directory in CrOS auto-update. Cros-flash based provision fail in moblab host because the autotest directory is wrongly deleted at the final step of auto-update process. This CL removes these codes since 'cleaning old autotest directory' is not needed in provision. BUG= chromium:649811 TEST=Use provision code to do cros flash on chromeos2-row2-rack8-host11, and run 'status moblab-scheduler-init' on the host. Change-Id: I04a3401b0dbfb40d24a048afdf44ba4955987db4 Reviewed-on: https://chromium-review.googlesource.com/391072 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa/lib/auto_updater.py
,
Oct 5 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/9e41745165cb031d5607b78fbdd050db375bc27c commit 9e41745165cb031d5607b78fbdd050db375bc27c Author: xixuan <xixuan@chromium.org> Date: Wed Oct 05 20:11:23 2016
,
Oct 6 2016
This issue is still there, but most of time, it happens, sometimes, it changes to another error. Most of time, the error is like: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row2-rack8-host5/50172-provision/20160610121949/debug/ still don't start 'moblab-scheduler-init'. Another time, the error is changed to: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row2-rack8-host3/49258-provision/20160610012513/debug/ 10/06 01:33:08.640 DEBUG| base_utils:0280| [stdout] 192.168.231.120 is unreachable 10/06 01:33:08.682 DEBUG| retry_util:0115| <class 'socket.error'>([Errno 104] Connection reset by peer) 10/06 01:33:18.732 DEBUG| retry_util:0115| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>) 10/06 01:33:38.833 DEBUG| retry_util:0115| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>) 10/06 01:34:08.642 ERROR| repair:0313| Failed: Legacy host verification checks ... TimeoutError: Timeout occurred- waited 60 seconds. 10/06 01:34:08.646 INFO | server_job:0153| FAIL ---- verify.cros timestamp=1475742848 localtime=Oct 06 01:34:08 Timeout occurred- waited 60 seconds. The CrOS Update error is more detailed now, anyone knows what leads to these different error?
,
Oct 7 2016
I setup a local autotest, add a moblab host on it, and kick off a repair job on the host.
due to errors, repair.au is kicked off, and provision claim itself successful. However, it will fail verify.cros by error:
10/07 15:08:15.062 DEBUG| ssh_host:0180| Running (ssh) 'status moblab-scheduler-init | grep start/running'
10/07 15:08:15.106 ERROR| repair:0313| Failed: Legacy host verification checks
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 310, in _verify_host
self.verify(host)
File "/usr/local/autotest/server/hosts/repair.py", line 55, in verify
host.verify_software()
File "/usr/local/autotest/server/hosts/moblab_host.py", line 241, in verify_software
self._verify_moblab_services()
File "/usr/local/autotest/server/hosts/moblab_host.py", line 251, in _verify_moblab_services
if not self.upstart_status(service):
File "/usr/local/autotest/server/hosts/cros_host.py", line 1482, in upstart_status
service_name).stdout.strip() != ''
File "/usr/local/autotest/server/hosts/ssh_host.py", line 190, in run
options, stdin, args, ignore_timeout)
File "/usr/local/autotest/server/hosts/ssh_host.py", line 157, in _run
raise error.AutoservRunError("command execution error", result)
Debugging more, the only found is: moblab-scheduler-init seems taking a long time to come up under the new provision framework. After I add a time.sleep(100) in auto_update, the check passed.
All commands that runs on a moblab host regarding 'stateful update' is pasted in https://docs.google.com/document/d/1nBrilayofj9zrMMN7HhnllU35zBMi7MGk3wFjLnr5PE/edit, including BOTH OLD and NEW framework.
Not sure why new framework is different from old framework.
,
Oct 12 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/9025d9f9fd78f3111223c998ebcd86903e900b17 commit 9025d9f9fd78f3111223c998ebcd86903e900b17 Author: xixuan <xixuan@chromium.org> Date: Fri Oct 07 23:10:59 2016 autotest: add a retry for upstart service check of moblab. New provision framework makes restarting moblab upstart service much longer than before. We haven't figure it out yet. This CL is a fix for that. We add a retry for all upstart service check commands on moblab, and set timeout=2 mins to make sure that this verify check will pass. BUG= chromium:649811 TEST=Run a repair job on moblab host of local autotest. Verified that it passes the upstart service checks. Change-Id: I2068eef5410aecd9915608bfb5ef6002ba2cf1bf Reviewed-on: https://chromium-review.googlesource.com/395547 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> [modify] https://crrev.com/9025d9f9fd78f3111223c998ebcd86903e900b17/server/hosts/moblab_host.py
,
Oct 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/9e46e53357153310fe5c75ff94c0980c75c8b947 commit 9e46e53357153310fe5c75ff94c0980c75c8b947 Author: Simran Basi <sbasi@google.com> Date: Tue Oct 18 16:36:33 2016 Revert "chromeos_config: temporarily mark guado_moblab-paladin experimental" This reverts commit a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5. BUG= chromium:649811 TEST=None Change-Id: I535e4c37cece46922f97136e79adc46a231f2e8b Reviewed-on: https://chromium-review.googlesource.com/400007 Commit-Ready: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org> Reviewed-by: Michael Tang <ntang@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/9e46e53357153310fe5c75ff94c0980c75c8b947/cbuildbot/config_dump.json [modify] https://crrev.com/9e46e53357153310fe5c75ff94c0980c75c8b947/cbuildbot/chromeos_config.py
,
Nov 1 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/25fa4a005db8e95ac0c295b527c975bec96bf7a7 commit 25fa4a005db8e95ac0c295b527c975bec96bf7a7 Author: xixuan <xixuan@chromium.org> Date: Tue Nov 01 18:26:29 2016
,
Dec 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b commit 971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b Author: xixuan <xixuan@chromium.org> Date: Thu Nov 03 17:15:52 2016 Autotest: Let moblab host restart AFE tunnel if needed before verifying. There's a case that before verifying a moblab host, its AFE tunnel is closed for some reasons, like unexpected reboot without using |reboot| func in moblab_host.py This CL restart the AFE if it is down due to a closed tunnel. BUG= chromium:649811 TEST=Run "/usr/local/autotest/server/autoserv -p -r /tmp/moblab-test7 -m chromeos2-row1-rack8-host1 --verbose --lab True --provision --job-labels cros-version:guado_moblab-paladin/R56-8953.0.0-rc4" with new change on chromeos-server36.cbf (the guado shard) Change-Id: Icb8eb7e7278699557f3020b14eae4f63a2f3f7d8 Reviewed-on: https://chromium-review.googlesource.com/406639 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> [modify] https://crrev.com/971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b/server/hosts/moblab_host.py
,
Mar 19 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by bugdroid1@chromium.org
, Sep 23 2016