New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 888934 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

test_pre_run failing on chromium.perf/Android Nexus5X WebView Perf due to 'build190-b7--*' devices died

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Sep 25

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of nednguyen@google.com

test_pre_run failing on chromium.perf/Android Nexus5X WebView Perf

Builders failed on: 
- Android Nexus5X WebView Perf: 
  https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Android%20Nexus5X%20WebView%20Perf

Error is:
ValueError: Not enough available machines exist in in swarmingpool. Contact labs to rack in more hardware


 
Cc: vhang@chromium.org
Components: Infra>Labs
Labels: -Pri-2 Pri-1
Summary: test_pre_run failing on chromium.perf/Android Nexus5X WebView Perf due to 'build190-b7--*' devices died (was: test_pre_run failing on chromium.perf/Android Nexus5X WebView Perf)
Looking at https://logs.chromium.org/logs/chrome/buildbucket/cr-buildbucket.appspot.com/8934428022412973712/+/steps/test_pre_run/0/steps/s__trigger__performance_webview_test_suite_on_Android_device_Nexus_5X/0/stdout, there are only 14 machines that are alive, whereas we expect 15 shards.

'build188-b7--device1', 'build188-b7--device2', 'build188-b7--device3', 'build188-b7--device4', 'build188-b7--device5', 'build188-b7--device6', 'build188-b7--device7', 'build189-b7--device1', 'build189-b7--device2', 'build189-b7--device3', 'build189-b7--device4', 'build189-b7--device5', 'build189-b7--device6', 'build189-b7--device7'

Somehow we lost all the 'build190-b7--*' devices. Can lab team check?
Owner: jo...@chromium.org
lsusb fails/hangs on the host device:

[1] DOCKER chrome-bot@build190-b7:(Linux 14.04):~$ lsusb
<no_output>

Which generally means somethings locking up the USB chain. 

Quite a few devices on this host (Mobile devices + battors). Taking a closer look. 
strace hangs here:
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8e532f0000
read(6, "5000\n", 4096)                 = 5
close(6)                                = 0
munmap(0x7f8e532f0000, 4096)            = 0
open("/sys/bus/usb/devices/2-2/descriptors", O_RDONLY) = 6
read(6, 

Looks like it's one of the battors.

Specifically, one of the USB isolators
Status: Fixed (was: Available)
Swapped the isolator.

lsusb is working again and looks like the devices are going into quarantine cooldown.

It's my understanding we're moving away from battors, but that would involve some de-coupling of hardware which will need to happen across the fleet. 

futex(0x7fb4e1a809d0, FUTEX_WAIT, 30631, NULL) = -1 EAGAIN (Resource temporarily unavailable)
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Sign in to add a comment