scheduling job from shard can result in a race condition causing an exception: list index out of range |
||||||||
Issue descriptionhttps://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/82555879-kevcheng/chromeos6-row1-rack2-host5/debug 10/25 10:35:42.042 ERROR| repair:0313| Failed: servo host software is up-to-date Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 310, in _verify_host self.verify(host) File "/usr/local/autotest/server/hosts/servo_repair.py", line 24, in verify host.update_image(wait_for_update=False) File "/usr/local/autotest/site-packages/statsd/timer.py", line 95, in _decorator return function(*args, **kwargs) File "/usr/local/autotest/server/hosts/servo_host.py", line 531, in update_image status, current_build_number = self._check_for_reboot(updater) File "/usr/local/autotest/server/hosts/servo_host.py", line 446, in _check_for_reboot self.schedule_synchronized_reboot(dut_list, afe) File "/usr/local/autotest/server/hosts/servo_host.py", line 415, in schedule_synchronized_reboot control_type=control_type, hosts=[dut]) File "/usr/local/autotest/server/frontend.py", line 637, in create_job return self.get_jobs(id=id)[0] IndexError: list index out of range Will investigate why afe.create_job is failing when getting the job after it creates it.
,
Oct 26 2016
+xixuan who did some shard RPC cleanup. Xixuan, maybe you are the better owner of this bug?
,
Oct 27 2016
Re-assign to xixian@, who is working on the RPC forward project
,
Oct 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/79589980308c023b6dbfcece379d72efd505fbfc commit 79589980308c023b6dbfcece379d72efd505fbfc Author: Kevin Cheng <kevcheng@chromium.org> Date: Tue Oct 25 20:26:04 2016 [autotest] Update servo host reboot to talk to cautotest directly. I've seen the create_job afe call fail which causes the test job to fail. We don't want that to happen so let's catch that exception and log it and just have another dut schedule the reboot for us. The afe create_job call fails because it looks like the 'create_job' rpc returns with an ID that is not yet available when get_jobs is called and so it returns an empty list and we try to index that and raise an IndexError. This looks to be caused by calling the shard instead of cautotest directly so also change the afe to call cautotest as well. https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/82555879-kevcheng/chromeos6-row1-rack2-host5/debug BUG= chromium:599533 BUG= chromium:659720 TEST=None. Change-Id: I77c7b2b96ffa3b7a9ea2ee7a1c0960c4f9065ba4 Reviewed-on: https://chromium-review.googlesource.com/403290 Commit-Ready: Kevin Cheng <kevcheng@chromium.org> Tested-by: Kevin Cheng <kevcheng@chromium.org> Reviewed-by: Kevin Cheng <kevcheng@chromium.org> [modify] https://crrev.com/79589980308c023b6dbfcece379d72efd505fbfc/server/hosts/servo_host.py
,
Dec 8 2016
Is this issue fixed by kevin's CL? Looks this CL changes the afe to call cautotest for 'create_job' rpc.
,
Dec 9 2016
my cl just ignores this issue, this is still a bug in the sense that calling create_job from shard could fail in the way described in #4's commit message.
,
Dec 9 2016
understand. Let's wait for more examples before fixing it :)
,
Dec 13 2016
,
Jan 19 2017
,
Feb 13 2018
Issue has not been modified or commented on in the last 365 days, please re-open or file a new bug if this is still an issue. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by kevcheng@chromium.org
, Oct 26 2016Owner: shuqianz@chromium.org
Summary: scheduling job from shard can result in a race condition causing an exception (was: when scheduling servo host reboot job, afe.create_job will fail)