tko/parse failure left Skylab bot in dut_state "running" |
|||
Issue descriptionExample task: https://chrome-swarming.appspot.com/task?id=3ea3f9d0d1546210 The tko/parse failure is tracked separately in issue 862431 But the bot was left in dut_state running at the end of this task, which means that no other tasks could be scheduled against it.
,
Jul 11
I think we should just get rid of running state in Skylab? I don't want to replicate the "DUT state is wrong" mess in Skylab too. Swarming knows if a bot/DUT has a job or not. We only need markers for if a DUT is dead or healthy, otherwise it is running if it has a job, ready if not.
,
Jul 11
This does not affect Autotest/Moblab, since job_aborter makes sure to not leave dangling jobs/hosts.
,
Jul 11
I think I can fix this with a one line change; I still want to hold the bug for getting rid of running state though.
,
Jul 13
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/infra/lucifer/+/c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0 commit c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0 Author: Allen Li <ayatane@google.com> Date: Fri Jul 13 01:50:18 2018 lucifer: Mark host running just before autoserv Mark host running just before autoserv, in particular, after we potentially return out of doRunningStep without setting the host back to ready or something else in updateHostState. BUG= chromium:862793 TEST=None Change-Id: Idd6c6c4cbed5f207282432ebad36fefc54876e6f Reviewed-on: https://chromium-review.googlesource.com/1134439 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0/src/lucifer/oldcmd/lucifer_run_job/main.go
,
Jul 17
Seems like this is needed in the current phase (mark skylab-based paladin important) Marking this fixed to indicate that the reported problem is now gone. Please use a separate bug to track removing RUNNING/PROVISIONING host states entirely.
,
Jul 20
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/infra/lucifer/+/c090ec8583869c737ebeb9aa3dcc719a91af002a commit c090ec8583869c737ebeb9aa3dcc719a91af002a Author: Allen Li <ayatane@google.com> Date: Fri Jul 20 01:20:51 2018 skylab_swarming_worker: Don't track running state This state doesn't do anything, but just introduces the opportunity for letting a host get stuck in Running state like with Autotest. BUG= chromium:862793 TEST=None Change-Id: Ibbf41091b116d3abbe7dbaae3ce3bce09976f3be Reviewed-on: https://chromium-review.googlesource.com/1134568 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/c090ec8583869c737ebeb9aa3dcc719a91af002a/src/lucifer/cmd/skylab_swarming_worker/lucifer.go [modify] https://crrev.com/c090ec8583869c737ebeb9aa3dcc719a91af002a/src/lucifer/cmd/skylab_swarming_worker/internal/swarming/botcache/botcache.go |
|||
►
Sign in to add a comment |
|||
Comment 1 by pprabhu@chromium.org
, Jul 11