USB stick is not been unplugged during normal operation. |
|||
Issue descriptionFew DUTs failed CTS test due to usb stick ins not been unplugged which is causing additional partitions are detected and interfere CTS test. chromeos6-row1-rack23-host19 (FAILED): localhost ~ # lsusb Bus 002 Device 004: ID 13fe:5500 Kingston Technology Company Inc. GPT PMBR size mismatch (6905791 != 15466495) will be corrected by w(rite). Disk /dev/sda: 7.4 GiB, 7918845952 bytes, 15466496 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 8B6017EC-09B6-FE48-BA68-36892FC16784 Device Start End Sectors Size Type /dev/sda1 4382720 6873087 2490368 1.2G Microsoft basic data /dev/sda2 20480 53247 32768 16M ChromeOS kernel /dev/sda3 286720 4382719 4096000 2G ChromeOS root fs /dev/sda4 53248 86015 32768 16M ChromeOS kernel /dev/sda5 282624 286719 4096 2M ChromeOS root fs /dev/sda6 16448 16448 1 512B ChromeOS kernel /dev/sda7 16449 16449 1 512B ChromeOS root fs /dev/sda8 86016 118783 32768 16M Microsoft basic data /dev/sda9 16450 16450 1 512B ChromeOS reserved /dev/sda10 16451 16451 1 512B ChromeOS reserved /dev/sda11 64 16447 16384 8M unknown /dev/sda12 249856 282623 32768 16M EFI System Partition table entries are not in disk order. localhost ~ # df -h | grep sda /dev/sda1 1.2G 1009M 141M 88% /media/removable/STATE /dev/sda8 12M 28K 12M 1% /media/removable/OEM Dive in by checking Servo status. Two setting needs to be set off in order to disable USB3 connection from Servo so DUT won't see USB stick and won't interfere your testing. prtctl4_pwren:off usb3_pwr_en:off Unfortuanally, either the settings are still on or Servo looses connection from labstation. More info please refer this bug. b/64332561
,
Aug 10 2017
Servo status is checked and reported in every repair or provision
task, so...
$ dut-status -f chromeos6-row1-rack23-host19 | egrep '(provision|repair)' | head -1
2017-08-10 11:25:43 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack23-host19/1302809-provision/
Looking in status.log for that provision task, we find this:
START ---- provision timestamp=1502321906 localtime=Aug 09 16:38:26
GOOD ---- verify.servo_ssh timestamp=1502321910 localtime=Aug 09 16:38:30
GOOD ---- verify.update timestamp=1502321912 localtime=Aug 09 16:38:32
GOOD ---- verify.brd_config timestamp=1502321913 localtime=Aug 09 16:38:33
GOOD ---- verify.ser_config timestamp=1502321913 localtime=Aug 09 16:38:33
GOOD ---- verify.job timestamp=1502321914 localtime=Aug 09 16:38:34
FAIL ---- verify.servod timestamp=1502321974 localtime=Aug 09 16:39:34 ''
So, there's something wrong with talking to the servod process for
that DUT. The initialization code that guarantees that the USB stick
gets unplugged and then stays that way has to talk to servod.
This doesn't explain why the USB stick got plugged in in the first
place, but it does explain why it's not getting unplugged now.
Provision only verifies and reports servo status: Forcing a repair
task will trigger automated procedures to get servod working, including
restarting the servod process. So, forcing the DUT through repair is
the first thing to try.
,
Aug 10 2017
So... what is the next step?
,
Aug 10 2017
> So... what is the next step?
Force repair on all the problem children.
There's enough that it ought to be done automatically, but...
there's no script to force repair, only verify.
This CL will add a command that can force repair:
https://chromium-review.googlesource.com/#/c/chromiumos/third_party/autotest/+/611252/
,
Aug 10 2017
That's reactive. Is there some proactive way to prevent this issue from coming back?
,
Aug 10 2017
> That's reactive. Is there some proactive way to prevent this issue from coming back? Hmmm... Well, we should probably have a different bug for any long term preventative strategies. But, for a short summary: we'd have to force servo repair more aggressively, even in operations when servo isn't required. For instance, we could call servo repair rather than verify during provisioning. A more complex possibility would be a verifier that looks for the USB stick's presence, and triggers servo repair.
,
Aug 10 2017
I've force a repair for chromeos6-row1-rack23-host19 via cautotest, but it hasn't yet run because of an in-progress CTS test.
,
Aug 10 2017
Does that mean this bug is fixed? Or do we know about any other DUTs in this state?
,
Aug 10 2017
There is a list of DUTs had similar issue. chromeos6-row2-rack22-host19 chromeos6-row1-rack23-host19 chromeos6-row2-rack21-host18 chromeos6-row2-rack20-host12 chromeos6-row1-rack17-host21 chromeos6-row1-rack15-host17 chromeos6-row2-rack20-host10 chromeos6-row2-rack23-host18 chromeos6-row2-rack21-host17 chromeos6-row2-rack21-host19 chromeos6-row1-rack22-host5 chromeos6-row1-rack18-host19 chromeos6-row1-rack16-host17 chromeos6-row2-rack20-host14 chromeos6-row1-rack15-host15 chromeos6-row2-rack22-host13 chromeos6-row1-rack17-host19 chromeos6-row2-rack22-host2 chromeos6-row2-rack22-host20 chromeos6-row2-rack20-host8 chromeos6-row1-rack17-host19 chromeos6-row2-rack20-host6 chromeos6-row2-rack21-host1 chromeos6-row1-rack17-host17 chromeos6-row2-rack22-host7 chromeos6-row1-rack22-host17 chromeos6-row2-rack12-host18 chromeos6-row1-rack20-host15 chromeos6-row2-rack15-host16 chromeos6-row1-rack16-host15 chromeos6-row2-rack22-host1 chromeos6-row1-rack22-host19 chromeos6-row4-rack2-host13 chromeos6-row2-rack22-host7 chromeos6-row4-rack2-host10 chromeos6-row1-rack22-host13 chromeos6-row4-rack3-host14 chromeos6-row4-rack2-host17 chromeos6-row4-rack2-host13 chromeos6-row3-rack3-host5 chromeos6-row2-rack21-host1 chromeos6-row2-rack22-host19 chromeos6-row4-rack3-host18 chromeos6-row1-rack15-host13 My team is resetting the Servo since some of them are disappeared from Servod. After that we need to force repair on all of them.
,
Aug 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/73e74b0d0157ac87e63a9fdca60b31f61bd06848 commit 73e74b0d0157ac87e63a9fdca60b31f61bd06848 Author: Richard Barnette <jrbarnette@chromium.org> Date: Fri Aug 11 05:47:24 2017 [autotest] Add a 'repair_hosts' command. We have a `reverify_hosts` command for triggering Verify tasks, but nothing for triggering Repair tasks. So, add the command. BUG= chromium:754362 TEST=Run against a couple of working, idle DUTs in the lab Change-Id: I9901d9aa5fb3852bd93013e768681e5e259b15c3 Reviewed-on: https://chromium-review.googlesource.com/611252 Commit-Ready: Richard Barnette <jrbarnette@chromium.org> Tested-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/contrib/reverify_hosts [add] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/contrib/repair_hosts [modify] https://crrev.com/73e74b0d0157ac87e63a9fdca60b31f61bd06848/server/frontend.py
,
Aug 11 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/ed4b82d39ac95ac98dae789be34e105c7ab7f861 commit ed4b82d39ac95ac98dae789be34e105c7ab7f861 Author: Richard Barnette <jrbarnette@google.com> Date: Fri Aug 11 17:35:53 2017
,
Aug 15 2017
Should this be considered fixed?
,
Sep 14 2017
> Should this be considered fixed? Honestly, I don't know. If there are any DUTs left in this state, there's a good chance that that the problem requires manual intervention, in which case this bug isn't the right vehicle. So... Let's declare victory (or at least, an end). |
|||
►
Sign in to add a comment |
|||
Comment 1 by rohi...@chromium.org
, Aug 10 2017Status: Assigned (was: Untriaged)