chromeos15-row1-rack6-host4 is stuck "Running"
Reported by
jrbarnette@chromium.org,
Aug 28
|
||||||
Issue descriptionchromeos15-row1-rack6-host4 is stuck in state "Running". The DUT has no shard, even though it has "board:banon" and banon is assigned to shard cros-full-0023.mtv.corp.google.com.
,
Aug 28
,
Aug 30
,
Aug 31
,
Sep 1
,
Sep 5
The last job on that host was aborted but job_reporter claims that it did transition the host to ready https://stainless.corp.google.com/browse/chromeos-autotest-results/227285760-chromeos-test/chromeos15-row1-rack6-host4/ job_reporter: 2018-08-14 10:49:56,075:DEBUG:handlers:__call__:51:Received event 'HOST_READY' with message 'chromeos15-row1-rack6-host4' job_reporter: 2018-08-14 10:49:56,089:INFO:models:on_attribute_changed:775:chromeos15-row1-rack6-host4 -> Ready
,
Sep 5
database surgery like this needs to be done both in master and shard DB.
shard heartbeat will move information from shard --> master
but sentinel will move it from master --> shard.
So any surgery like this is hard to get right.
I'll give it another try.
On both shard and master
mysql> update afe_hosts set status='Ready' where id = 7998;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update afe_hosts set leased=0 where id = 7998;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 0
Finally, on the shard a provision has begun:
mysql> select * from afe_hosts where id = 7998 \G
*************************** 1. row ***************************
id: 7998
hostname: chromeos15-row1-rack6-host4
locked: 0
synch_id: NULL
status: Provisioning
invalid: 0
protection: 0
locked_by_id: NULL
lock_time: NULL
dirty: 1
leased: 1
shard_id: 245
lock_reason:
1 row in set (0.00 sec)
But, on the master, the leased bit is still wrong:
mysql> select * from afe_hosts where id = 7998 \G
*************************** 1. row ***************************
id: 7998
hostname: chromeos15-row1-rack6-host4
locked: 0
synch_id: NULL
status: Provisioning
invalid: 0
protection: 0
locked_by_id: NULL
lock_time: NULL
dirty: 1
leased: 0
shard_id: 245
lock_reason:
1 row in set (0.00 sec)
wait and watch.
,
Sep 6
The provision that followed failed: https://stainless.corp.google.com/browse/chromeos-autotest-results/hosts/chromeos15-row1-rack6-host4/1752662-provision/20180509162225/ But that's some firmware dependency problem (likely because the DUT hasn't updated in a while). This bug is fixed. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by jrbarnette@chromium.org
, Aug 28