New issue
Advanced search Search tips

Issue 854311 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 854352
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: ----



Sign in to add a comment

Bot with state 'need_reset' still run tasks with 'dut_state:ready'.

Project Member Reported by xixuan@chromium.org, Jun 19 2018

Issue description

For a given bot: https://chrome-swarming.appspot.com/bot?id=chromeos-skylab-bot-46c6c402-c6cb-4b7c-a96d-40f548f364ba&sort_stats=total%3Adesc

Here list some of the example failed tests:

https://chrome-swarming.appspot.com/task?id=3e322a3cd05c1a10&refresh=10
https://chrome-swarming.appspot.com/task?id=3e322a7a79630510&refresh=10
https://chrome-swarming.appspot.com/task?id=3e322a7dc2d71310&refresh=10

I see 2 issues and guess they're related:

1) The bot should be state of 'need_reset' after the first failure. However, it's still accepting tasks. This makes me assume that these tasks are pre-allocated to this bot.

2) The bot runs different tasks at the same time. Not sure whether it's the reason of the failure 'Client job got aborted.'.

Will let @pprabhu to decide whether this should be fixed from swarming side or lucifer.
 

Comment 1 by xixuan@chromium.org, Jun 19 2018

Surprisingly, it's not always happen. There're some passed suite, which clearly shows that when last task is finished, next task gets started:

e.g. https://chrome-swarming.appspot.com/bot?id=chromeos-skylab-bot-6ebbc18f-a5fa-4342-af4c-97efa5960966&sort_stats=total%3Adesc

Comment 2 by xixuan@chromium.org, Jun 19 2018

Cc: akes...@chromium.org

Comment 3 by xixuan@chromium.org, Jun 19 2018

Summary: Bot with state 'need_reset' still run tasks with 'dut_state:ready'. (was: Client job got aborted )
Create  Issue 854352  for  issue 2 . Make this bug focus on a bot's state need_reset should block itself to accept tasks.
Mergedinto: 854352
Status: Duplicate (was: Assigned)
Most likely this is also a fallout of multiple tasks running on the same bot.

Order of events:

- Bot is in state:ready
- Bot process A picks up test 1
- Bot process B picks up test 2
- test 1 fails. Bot moves to state needs_reset
- test 2 succeeds. Bot moves to state ready.

- Next test gets scheduled on the bot.

Sign in to add a comment