New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878192 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

chromeos15-row1-rack6-host4 is stuck "Running"

Reported by jrbarnette@chromium.org, Aug 28

Issue description

chromeos15-row1-rack6-host4 is stuck in state "Running".
The DUT has no shard, even though it has "board:banon" and
banon is assigned to shard cros-full-0023.mtv.corp.google.com.

 
$ atest host list -w cros-full-0023.mtv.corp.google.com chromeos15-row1-rack6-host4
Unknown host(s): 
        chromeos15-row1-rack6-host4

So, maybe just perform some simple database surgery.

MySQL [chromeos_autotest_db]> update afe_hosts set status="Ready" where hostname="chromeos15-row1-rack6-host4";
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0

And after a longer-than-suitable interval:
    $ atest host list chromeos15-row1-rack6-host4
    Host                         Status     Shard  [ ... ]
    chromeos15-row1-rack6-host4  Repairing  None   [ ... ]

But, still unknown to the shard...

Cc: aashuto...@chromium.org harpreet@chromium.org anmarroquin@chromium.org
Owner: jrbarnette@chromium.org
Status: Assigned (was: Untriaged)
Labels: Hotlist-Deputy
Cc: cros-conn-test-team@google.com
Owner: pprabhu@chromium.org
The last job on that host was aborted but job_reporter claims that it did transition the host to ready

https://stainless.corp.google.com/browse/chromeos-autotest-results/227285760-chromeos-test/chromeos15-row1-rack6-host4/

job_reporter: 2018-08-14 10:49:56,075:DEBUG:handlers:__call__:51:Received event 'HOST_READY' with message 'chromeos15-row1-rack6-host4'
job_reporter: 2018-08-14 10:49:56,089:INFO:models:on_attribute_changed:775:chromeos15-row1-rack6-host4 -> Ready
database surgery like this needs to be done both in master and shard DB.

shard heartbeat will move information from shard --> master
but sentinel will move it from master --> shard.

So any surgery like this is hard to get right.

I'll give it another try.

On both shard and master

mysql> update afe_hosts set status='Ready' where id = 7998;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update afe_hosts set leased=0 where id = 7998;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0


Finally, on the shard  a provision has begun:

mysql> select * from afe_hosts where id = 7998 \G
*************************** 1. row ***************************
          id: 7998
    hostname: chromeos15-row1-rack6-host4
      locked: 0
    synch_id: NULL
      status: Provisioning
     invalid: 0
  protection: 0
locked_by_id: NULL
   lock_time: NULL
       dirty: 1
      leased: 1
    shard_id: 245
 lock_reason:
1 row in set (0.00 sec)

But, on the master, the leased bit is still wrong:

mysql> select * from afe_hosts where id = 7998 \G
*************************** 1. row ***************************
          id: 7998
    hostname: chromeos15-row1-rack6-host4
      locked: 0
    synch_id: NULL
      status: Provisioning
     invalid: 0
  protection: 0
locked_by_id: NULL
   lock_time: NULL
       dirty: 1
      leased: 0
    shard_id: 245
 lock_reason:
1 row in set (0.00 sec)

wait and watch.



Status: Fixed (was: Assigned)
The provision that followed failed: https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos15-row1-rack6-host4/1752662-provision/20180509162225/

But that's some firmware dependency problem (likely because the DUT hasn't updated in a while).

This bug is fixed.

Sign in to add a comment