New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754362 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

USB stick is not been unplugged during normal operation.

Project Member Reported by haoweiw@chromium.org, Aug 10 2017

Issue description

Few DUTs failed CTS test due to usb stick ins not been unplugged which is causing additional partitions are detected and interfere CTS test. 

chromeos6-row1-rack23-host19 (FAILED):
localhost ~ # lsusb
Bus 002 Device 004: ID 13fe:5500 Kingston Technology Company Inc. 

GPT PMBR size mismatch (6905791 != 15466495) will be corrected by w(rite).
Disk /dev/sda: 7.4 GiB, 7918845952 bytes, 15466496 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8B6017EC-09B6-FE48-BA68-36892FC16784

Device       Start     End Sectors  Size Type
/dev/sda1  4382720 6873087 2490368  1.2G Microsoft basic data
/dev/sda2    20480   53247   32768   16M ChromeOS kernel
/dev/sda3   286720 4382719 4096000    2G ChromeOS root fs
/dev/sda4    53248   86015   32768   16M ChromeOS kernel
/dev/sda5   282624  286719    4096    2M ChromeOS root fs
/dev/sda6    16448   16448       1  512B ChromeOS kernel
/dev/sda7    16449   16449       1  512B ChromeOS root fs
/dev/sda8    86016  118783   32768   16M Microsoft basic data
/dev/sda9    16450   16450       1  512B ChromeOS reserved
/dev/sda10   16451   16451       1  512B ChromeOS reserved
/dev/sda11      64   16447   16384    8M unknown
/dev/sda12  249856  282623   32768   16M EFI System

Partition table entries are not in disk order.

localhost ~ # df -h | grep sda
/dev/sda1                1.2G 1009M  141M  88% /media/removable/STATE
/dev/sda8                 12M   28K   12M   1% /media/removable/OEM

Dive in by checking Servo status.  Two setting needs to be set off in order to disable USB3 connection from Servo so DUT won't see USB stick and won't interfere your testing. 
prtctl4_pwren:off
usb3_pwr_en:off

Unfortuanally, either the settings are still on or Servo looses connection from labstation. 

More info please refer this bug. b/64332561 


 
Owner: dgarr...@chromium.org
Status: Assigned (was: Untriaged)
Assigning to dgarrett (the infra deputy for this week) 
Servo status is checked and reported in every repair or provision
task, so...

    $ dut-status -f chromeos6-row1-rack23-host19 | egrep '(provision|repair)' | head -1
        2017-08-10 11:25:43  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack23-host19/1302809-provision/

Looking in status.log for that provision task, we find this:
START	----	provision	timestamp=1502321906	localtime=Aug 09 16:38:26	
	GOOD	----	verify.servo_ssh	timestamp=1502321910	localtime=Aug 09 16:38:30	
	GOOD	----	verify.update	timestamp=1502321912	localtime=Aug 09 16:38:32	
	GOOD	----	verify.brd_config	timestamp=1502321913	localtime=Aug 09 16:38:33	
	GOOD	----	verify.ser_config	timestamp=1502321913	localtime=Aug 09 16:38:33	
	GOOD	----	verify.job	timestamp=1502321914	localtime=Aug 09 16:38:34	
	FAIL	----	verify.servod	timestamp=1502321974	localtime=Aug 09 16:39:34	''

So, there's something wrong with talking to the servod process for
that DUT.  The initialization code that guarantees that the USB stick
gets unplugged and then stays that way has to talk to servod.

This doesn't explain why the USB stick got plugged in in the first
place, but it does explain why it's not getting unplugged now.

Provision only verifies and reports servo status:  Forcing a repair
task will trigger automated procedures to get servod working, including
restarting the servod process.  So, forcing the DUT through repair is
the first thing to try.

So... what is the next step?
> So... what is the next step?

Force repair on all the problem children.
There's enough that it ought to be done automatically, but...
there's no script to force repair, only verify.

This CL will add a command that can force repair:
    https://chromium-review.googlesource.com/#/c/chromiumos/third_party/autotest/+/611252/
That's reactive. Is there some proactive way to prevent this issue from coming back?
> That's reactive. Is there some proactive way to prevent this issue from coming back?

Hmmm...  Well, we should probably have a different bug for
any long term preventative strategies.  But, for a short summary:
we'd have to force servo repair more aggressively, even in
operations when servo isn't required.  For instance, we could
call servo repair rather than verify during provisioning.  A more
complex possibility would be a verifier that looks for the USB
stick's presence, and triggers servo repair.

I've force a repair for chromeos6-row1-rack23-host19 via cautotest, but it hasn't yet run because of an in-progress CTS test.

Does that mean this bug is fixed? Or do we know about any other DUTs in this state?
There is a list of DUTs had similar issue. 

chromeos6-row2-rack22-host19
chromeos6-row1-rack23-host19
chromeos6-row2-rack21-host18
chromeos6-row2-rack20-host12
chromeos6-row1-rack17-host21
chromeos6-row1-rack15-host17
chromeos6-row2-rack20-host10
chromeos6-row2-rack23-host18
chromeos6-row2-rack21-host17
chromeos6-row2-rack21-host19
chromeos6-row1-rack22-host5
chromeos6-row1-rack18-host19
chromeos6-row1-rack16-host17
chromeos6-row2-rack20-host14
chromeos6-row1-rack15-host15
chromeos6-row2-rack22-host13
chromeos6-row1-rack17-host19
chromeos6-row2-rack22-host2
chromeos6-row2-rack22-host20
chromeos6-row2-rack20-host8
chromeos6-row1-rack17-host19
chromeos6-row2-rack20-host6
chromeos6-row2-rack21-host1
chromeos6-row1-rack17-host17
chromeos6-row2-rack22-host7
chromeos6-row1-rack22-host17
chromeos6-row2-rack12-host18
chromeos6-row1-rack20-host15
chromeos6-row2-rack15-host16
chromeos6-row1-rack16-host15
chromeos6-row2-rack22-host1
chromeos6-row1-rack22-host19
chromeos6-row4-rack2-host13
chromeos6-row2-rack22-host7
chromeos6-row4-rack2-host10
chromeos6-row1-rack22-host13
chromeos6-row4-rack3-host14
chromeos6-row4-rack2-host17
chromeos6-row4-rack2-host13
chromeos6-row3-rack3-host5
chromeos6-row2-rack21-host1
chromeos6-row2-rack22-host19
chromeos6-row4-rack3-host18
chromeos6-row1-rack15-host13

My team is resetting the Servo since some of them are disappeared from Servod. After that we need to force repair on all of them. 
Project Member

Comment 10 by bugdroid1@chromium.org, Aug 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/73e74b0d0157ac87e63a9fdca60b31f61bd06848

commit 73e74b0d0157ac87e63a9fdca60b31f61bd06848
Author: Richard Barnette <jrbarnette@chromium.org>
Date: Fri Aug 11 05:47:24 2017

[autotest] Add a 'repair_hosts' command.

We have a `reverify_hosts` command for triggering Verify tasks, but
nothing for triggering Repair tasks.  So, add the command.

BUG= chromium:754362 
TEST=Run against a couple of working, idle DUTs in the lab

Change-Id: I9901d9aa5fb3852bd93013e768681e5e259b15c3
Reviewed-on: https://chromium-review.googlesource.com/611252
Commit-Ready: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/contrib/reverify_hosts
[add] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/contrib/repair_hosts
[modify] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/server/frontend.py

Project Member

Comment 11 by bugdroid1@chromium.org, Aug 11 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/ed4b82d39ac95ac98dae789be34e105c7ab7f861

commit ed4b82d39ac95ac98dae789be34e105c7ab7f861
Author: Richard Barnette <jrbarnette@google.com>
Date: Fri Aug 11 17:35:53 2017

Owner: jrbarnette@chromium.org
Should this be considered fixed?
Status: WontFix (was: Assigned)
> Should this be considered fixed?

Honestly, I don't know.  If there are any DUTs left in this
state, there's a good chance that that the problem requires
manual intervention, in which case this bug isn't the right
vehicle.

So... Let's declare victory (or at least, an end).

Sign in to add a comment