New issue
Advanced search Search tips

Issue 862793 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 17
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 862431



Sign in to add a comment

tko/parse failure left Skylab bot in dut_state "running"

Project Member Reported by pprabhu@chromium.org, Jul 11

Issue description

Example task: https://chrome-swarming.appspot.com/task?id=3ea3f9d0d1546210

The tko/parse failure is tracked separately in  issue 862431 

But the bot was left in dut_state running at the end of this task, which means that no other tasks could be scheduled against it.

 
Blocking: 862431
I think we should just get rid of running state in Skylab?  I don't want to replicate the "DUT state is wrong" mess in Skylab too.

Swarming knows if a bot/DUT has a job or not.  We only need markers for if a DUT is dead or healthy, otherwise it is running if it has a job, ready if not.
This does not affect Autotest/Moblab, since job_aborter makes sure to not leave dangling jobs/hosts.
Status: Started (was: Assigned)
I think I can fix this with a one line change; I still want to hold the bug for getting rid of running state though.
Project Member

Comment 5 by bugdroid1@chromium.org, Jul 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/infra/lucifer/+/c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0

commit c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0
Author: Allen Li <ayatane@google.com>
Date: Fri Jul 13 01:50:18 2018

lucifer: Mark host running just before autoserv

Mark host running just before autoserv, in particular, after we
potentially return out of doRunningStep without setting the host back
to ready or something else in updateHostState.

BUG= chromium:862793 
TEST=None

Change-Id: Idd6c6c4cbed5f207282432ebad36fefc54876e6f
Reviewed-on: https://chromium-review.googlesource.com/1134439
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/c4f78a783c7cc5847c10f10d87a2258d6fd0f3e0/src/lucifer/oldcmd/lucifer_run_job/main.go

Status: Fixed (was: Started)
Seems like this is needed in the current phase (mark skylab-based paladin important)

Marking this fixed to indicate that the reported problem is now gone. Please use a separate bug to track removing RUNNING/PROVISIONING host states entirely.
Project Member

Comment 7 by bugdroid1@chromium.org, Jul 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/infra/lucifer/+/c090ec8583869c737ebeb9aa3dcc719a91af002a

commit c090ec8583869c737ebeb9aa3dcc719a91af002a
Author: Allen Li <ayatane@google.com>
Date: Fri Jul 20 01:20:51 2018

skylab_swarming_worker: Don't track running state

This state doesn't do anything, but just introduces the opportunity
for letting a host get stuck in Running state like with Autotest.

BUG= chromium:862793 
TEST=None

Change-Id: Ibbf41091b116d3abbe7dbaae3ce3bce09976f3be
Reviewed-on: https://chromium-review.googlesource.com/1134568
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/c090ec8583869c737ebeb9aa3dcc719a91af002a/src/lucifer/cmd/skylab_swarming_worker/lucifer.go
[modify] https://crrev.com/c090ec8583869c737ebeb9aa3dcc719a91af002a/src/lucifer/cmd/skylab_swarming_worker/internal/swarming/botcache/botcache.go

Sign in to add a comment