afe_lock_machine does not work properly. |
||||
Issue description
src/third_party/toolchain-utils/afe_lock_machine.py --status --chromeos_root /ssd/clean/ --remote chromeos2-row9-rack9-host9.cros
does not show any result.
It seems
the src/third_party/autotest/files/server/frontend.py
RpcClient::run does not return correctly.
It throws exceptions in this try block and it keeps retring. So
it got stuck here.
try:
result = utils.strip_unicode(rpc_call(**dargs))
if self.reply_debug:
print result
return result
except Exception:
,
Sep 6 2017
Just FYI...it looks like our script was working fine up through Aug. 31, and started failing on Sept. 1.
,
Sep 6 2017
Do you have a stack trace for the RPC (i.e., retry eventually fails, but do you know what the RPC is failing with?)
,
Sep 6 2017
It seems this error only happens when I use the hostname (instead of ip address) of the machines in the lab. The ip of chromeos2-row9-rack9-host17.cros is 100.115.232.97 ./afe_lock_machine.py --add --chromeos_root /usr/local/google/crostc/chromeos --remote chromeos2-row9-rack9-host17.cros (failed) ./afe_lock_machine.py --add --chromeos_root /usr/local/google/crostc/chromeos --remote 100.115.232.97 (successful) ./afe_lock_machine.py --add --chromeos_root /usr/local/google/crostc/chromeos --remote yunlian.svl (successful)
,
Sep 6 2017
That is because when you use the name, it tries to go through the HW lab AFE server; when you use the IP address, is uses our local AFE server, which (by-the-way) is probably using older code, since I'm not sure about the last time the chroot on chrotomation2 was 'repo sync'd It might be that if we do a repo sync on chrotomation2, the local AFE server will stop working as well...
,
Sep 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/toolchain-utils/+/4260aae2fe0abb937543c6cafc44f68e99de7410 commit 4260aae2fe0abb937543c6cafc44f68e99de7410 Author: Yunlian Jiang <yunlian@google.com> Date: Wed Sep 06 23:18:18 2017 USE local lock for buildbot_test_toolchain. Currently the AFE lock machanism for our nightly test is broken, we use local lock as a workaround for now. BUG= chromium:762589 TEST=Generate the crosperf command in another simple python file with the same code. Run the generated command line on crotomation2 and it goes though the locking machine stage. Change-Id: Icd3132bc383b63aab6d6f5237a04348f11d8726d Reviewed-on: https://chromium-review.googlesource.com/653213 Commit-Ready: Yunlian Jiang <yunlian@chromium.org> Tested-by: Yunlian Jiang <yunlian@chromium.org> Reviewed-by: Caroline Tice <cmtice@chromium.org> [modify] https://crrev.com/4260aae2fe0abb937543c6cafc44f68e99de7410/buildbot_test_toolchains.py
,
Sep 11 2017
Note that there is weak correlation between the timing of this breaking, and a pesky incorrect DUT locking problem disappearing from the lab: issue 732999 . If this script is ever resurrected, we should keep an eye out for that bug to reappear (That bug is very significant, enough that if it were caused by this script, we'd request suspension of the script)
,
Sep 27 2017
,
Oct 9 2017
Further investigation reveals: 'atest' only works on machines which are not 'on' BeyondCorp. I can get it to work on my workstation (disabling BeyondCorp), but I have not been able to get it to work on chrotomation2.svl, neither from my own account nor from the role account (mobiletc-prebuild). So 'atest' is not a viable solution for us. I could try to fix the RPC issues, but the RPC interface is subject to change without notice and we have been told we will get no help from the chromeos-infra team if we try to go that route. Eventually, something called "SkyLab" will come along and "replace all these corp RPCs with oauth-based appengine ones (and will provide replacement tools for atest and similar command line tools)." So at this point, my recommendation is to keep using the not-perfect-but-it-works file locks mechanism for now, and wait for SkyLab.
,
Oct 9 2017
The beyondcorp conclusions are accurate: b/32303896 I think it's best to punt on resurrecting this for skylab. Note that, at that time, it's best to work with chromeos-infra to support your use case fully (i.e., skylab-or-not, we are not likely to support / or maybe even allow automated administrative jobs that modify DUT inventory (locking a DUT counts as such a modification)). chromeos-infra should be able to support your use case easily via pools / some other mechanism.
,
Aug 16
Now that chrotomation2 is moved to MPC, afe_lock_machine is working.
,
Aug 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/toolchain-utils/+/b1afe3f2c2d4219ce490ffa111f530983b171141 commit b1afe3f2c2d4219ce490ffa111f530983b171141 Author: Ting-Yuan Huang <laszio@chromium.org> Date: Tue Aug 21 23:48:17 2018 Revert "USE local lock for buildbot_test_toolchain." This reverts commit 4260aae2fe0abb937543c6cafc44f68e99de7410. Reason for revert: afe_lock_machine is fixed. Original change's description: > USE local lock for buildbot_test_toolchain. > > Currently the AFE lock machanism for our nightly test is broken, > we use local lock as a workaround for now. > > BUG= chromium:762589 > TEST=Generate the crosperf command in another simple python file > with the same code. Run the generated command line on > crotomation2 and it goes though the locking machine stage. > > Change-Id: Icd3132bc383b63aab6d6f5237a04348f11d8726d > Reviewed-on: https://chromium-review.googlesource.com/653213 > Commit-Ready: Yunlian Jiang <yunlian@chromium.org> > Tested-by: Yunlian Jiang <yunlian@chromium.org> > Reviewed-by: Caroline Tice <cmtice@chromium.org> Bug: chromium:762589 Change-Id: Ib9f787ff48953d384dd36c72811ebd0f20dd25db Reviewed-on: https://chromium-review.googlesource.com/1178897 Tested-by: Ting-Yuan Huang <laszio@chromium.org> Reviewed-by: Luis Lozano <llozano@chromium.org> Reviewed-by: Caroline Tice <cmtice@chromium.org> Commit-Queue: Ting-Yuan Huang <laszio@chromium.org> [modify] https://crrev.com/b1afe3f2c2d4219ce490ffa111f530983b171141/buildbot_test_toolchains.py |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Sep 6 2017Status: Assigned (was: Untriaged)