New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 649811 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

MobLab Provision with the new flow doesn't work

Project Member Reported by sbasi@chromium.org, Sep 23 2016

Issue description


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row1-rack8-host1/24091-repair/20162309114230/debug/

Is it just me or is it failing the verify checks and not actually doing a flashing of the device to stable version. It has the stable version label already on this DUT, maybe thats why?
 
Project Member

Comment 1 by bugdroid1@chromium.org, Sep 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5

commit a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5
Author: Aviv Keshet <akeshet@chromium.org>
Date: Fri Sep 23 21:38:57 2016

chromeos_config: temporarily mark guado_moblab-paladin experimental

BUG= chromium:649811 
TEST=None

Change-Id: Icec02ad4437d0b01e02156b1096f17dce68bd22f
Reviewed-on: https://chromium-review.googlesource.com/388747
Reviewed-by: Simran Basi <sbasi@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5/cbuildbot/config_dump.json
[modify] https://crrev.com/a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5/cbuildbot/chromeos_config.py

Project Member

Comment 2 by bugdroid1@chromium.org, Sep 23 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c1dd81f370cd7dfdaac9c649b67b9dcc84a52c9c

commit c1dd81f370cd7dfdaac9c649b67b9dcc84a52c9c
Author: Richard Barnette <jrbarnette@google.com>
Date: Fri Sep 23 22:38:40 2016

Comment 3 by sbasi@chromium.org, Sep 23 2016

Cc: sbasi@chromium.org
Owner: xixuan@chromium.org
Status: Assigned (was: Untriaged)
Summary: MobLab Provision with the new flow doesn't work (was: MobLab Repair doesn't appear to be working)
I kicked off a repair, fixed the host, and it began testing again!

http://cautotest.corp.google.com/afe/#tab_id=view_host&object_id=1401

Xixuan, your new provision flow has a problem with MobLab.

Comment 4 by xixuan@chromium.org, Sep 29 2016

After a mysterious debugging route, the problem happens because I delete the "old autotest directory" at the final of provisioning. And, I don't pass a right "old autotest directory" for this deletion.

Previous provision code autoupdater.py also contains this part. The reason that this problem doesn't happen is in that case, the "old autotest directory" is an attribute of a moblab host, and it is a temporary folder.

The solution is to delete "delete the old autotest directory" code, since it's not needed in provisioning.
Project Member

Comment 5 by bugdroid1@chromium.org, Oct 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa

commit f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa
Author: xixuan <xixuan@chromium.org>
Date: Thu Sep 29 21:18:21 2016

auto_updater: Don't delete autotest directory in CrOS auto-update.

Cros-flash based provision fail in moblab host because the autotest directory
is wrongly deleted at the final step of auto-update process. This CL removes
these codes since 'cleaning old autotest directory' is not needed in provision.

BUG= chromium:649811 
TEST=Use provision code to do cros flash on chromeos2-row2-rack8-host11, and
run 'status moblab-scheduler-init' on the host.

Change-Id: I04a3401b0dbfb40d24a048afdf44ba4955987db4
Reviewed-on: https://chromium-review.googlesource.com/391072
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/f00084c5d05eb0ec1c72f0c8bf6e71a9e29239fa/lib/auto_updater.py

Project Member

Comment 6 by bugdroid1@chromium.org, Oct 5 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/9e41745165cb031d5607b78fbdd050db375bc27c

commit 9e41745165cb031d5607b78fbdd050db375bc27c
Author: xixuan <xixuan@chromium.org>
Date: Wed Oct 05 20:11:23 2016

This issue is still there, but most of time, it happens, sometimes, it changes to another error.

Most of time, the error is like:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row2-rack8-host5/50172-provision/20160610121949/debug/

still don't start 'moblab-scheduler-init'.

Another time, the error is changed to:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row2-rack8-host3/49258-provision/20160610012513/debug/

10/06 01:33:08.640 DEBUG|        base_utils:0280| [stdout] 192.168.231.120 is unreachable
10/06 01:33:08.682 DEBUG|        retry_util:0115| <class 'socket.error'>([Errno 104] Connection reset by peer)
10/06 01:33:18.732 DEBUG|        retry_util:0115| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>)
10/06 01:33:38.833 DEBUG|        retry_util:0115| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>)
10/06 01:34:08.642 ERROR|            repair:0313| Failed: Legacy host verification checks
...
TimeoutError: Timeout occurred- waited 60 seconds.
10/06 01:34:08.646 INFO |        server_job:0153| 	FAIL	----	verify.cros	timestamp=1475742848	localtime=Oct 06 01:34:08	Timeout occurred- waited 60 seconds.


The CrOS Update error is more detailed now, anyone knows what leads to these different error?
I setup a local autotest, add a moblab host on it, and kick off a repair job on the host.

due to errors, repair.au is kicked off, and provision claim itself successful. However, it will fail verify.cros by error:

10/07 15:08:15.062 DEBUG|          ssh_host:0180| Running (ssh) 'status moblab-scheduler-init | grep start/running'
10/07 15:08:15.106 ERROR|            repair:0313| Failed: Legacy host verification checks
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 310, in _verify_host
    self.verify(host)
  File "/usr/local/autotest/server/hosts/repair.py", line 55, in verify
    host.verify_software()
  File "/usr/local/autotest/server/hosts/moblab_host.py", line 241, in verify_software
    self._verify_moblab_services()
  File "/usr/local/autotest/server/hosts/moblab_host.py", line 251, in _verify_moblab_services
    if not self.upstart_status(service):
  File "/usr/local/autotest/server/hosts/cros_host.py", line 1482, in upstart_status
    service_name).stdout.strip() != ''
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 190, in run
    options, stdin, args, ignore_timeout)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 157, in _run
    raise error.AutoservRunError("command execution error", result)


Debugging more, the only found is: moblab-scheduler-init seems taking a long time to come up under the new provision framework. After I add a time.sleep(100) in auto_update, the check passed.

All commands that runs on a moblab host regarding 'stateful update' is pasted in https://docs.google.com/document/d/1nBrilayofj9zrMMN7HhnllU35zBMi7MGk3wFjLnr5PE/edit, including BOTH OLD and NEW framework.

Not sure why new framework is different from old framework. 

Project Member

Comment 9 by bugdroid1@chromium.org, Oct 12 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/9025d9f9fd78f3111223c998ebcd86903e900b17

commit 9025d9f9fd78f3111223c998ebcd86903e900b17
Author: xixuan <xixuan@chromium.org>
Date: Fri Oct 07 23:10:59 2016

autotest: add a retry for upstart service check of moblab.

New provision framework makes restarting moblab upstart service much longer
than before. We haven't figure it out yet.

This CL is a fix for that. We add a retry for all upstart service check
commands on moblab, and set timeout=2 mins to make sure that this verify check
will pass.

BUG= chromium:649811 
TEST=Run a repair job on moblab host of local autotest. Verified that it passes
the upstart service checks.

Change-Id: I2068eef5410aecd9915608bfb5ef6002ba2cf1bf
Reviewed-on: https://chromium-review.googlesource.com/395547
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Simran Basi <sbasi@chromium.org>

[modify] https://crrev.com/9025d9f9fd78f3111223c998ebcd86903e900b17/server/hosts/moblab_host.py

Project Member

Comment 10 by bugdroid1@chromium.org, Oct 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/9e46e53357153310fe5c75ff94c0980c75c8b947

commit 9e46e53357153310fe5c75ff94c0980c75c8b947
Author: Simran Basi <sbasi@google.com>
Date: Tue Oct 18 16:36:33 2016

Revert "chromeos_config: temporarily mark guado_moblab-paladin experimental"

This reverts commit a1476aadf3a63d1672dc451ac4fb37c3f9ce71c5.

BUG= chromium:649811 
TEST=None
Change-Id: I535e4c37cece46922f97136e79adc46a231f2e8b
Reviewed-on: https://chromium-review.googlesource.com/400007
Commit-Ready: Simran Basi <sbasi@chromium.org>
Tested-by: Simran Basi <sbasi@chromium.org>
Reviewed-by: Michael Tang <ntang@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/9e46e53357153310fe5c75ff94c0980c75c8b947/cbuildbot/config_dump.json
[modify] https://crrev.com/9e46e53357153310fe5c75ff94c0980c75c8b947/cbuildbot/chromeos_config.py

Project Member

Comment 11 by bugdroid1@chromium.org, Nov 1 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/25fa4a005db8e95ac0c295b527c975bec96bf7a7

commit 25fa4a005db8e95ac0c295b527c975bec96bf7a7
Author: xixuan <xixuan@chromium.org>
Date: Tue Nov 01 18:26:29 2016

Project Member

Comment 12 by bugdroid1@chromium.org, Dec 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b

commit 971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b
Author: xixuan <xixuan@chromium.org>
Date: Thu Nov 03 17:15:52 2016

Autotest: Let moblab host restart AFE tunnel if needed before verifying.

There's a case that before verifying a moblab host, its AFE tunnel is closed
for some reasons, like unexpected reboot without using |reboot| func in
moblab_host.py

This CL restart the AFE if it is down due to a closed tunnel.

BUG= chromium:649811 
TEST=Run "/usr/local/autotest/server/autoserv -p -r /tmp/moblab-test7 -m
chromeos2-row1-rack8-host1 --verbose --lab True --provision --job-labels
cros-version:guado_moblab-paladin/R56-8953.0.0-rc4" with new change on
chromeos-server36.cbf (the guado shard)

Change-Id: Icb8eb7e7278699557f3020b14eae4f63a2f3f7d8
Reviewed-on: https://chromium-review.googlesource.com/406639
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Simran Basi <sbasi@chromium.org>

[modify] https://crrev.com/971fe949e781ba67a5dd6f5f9a4f0a38a3ae713b/server/hosts/moblab_host.py

Status: Fixed (was: Assigned)

Sign in to add a comment