New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 617310 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 604568



Sign in to add a comment

linux_android_rel_ng runs at maximum swarming capacity during peak times and causes builds to fail

Project Member Reported by boliu@chromium.org, Jun 3 2016

Issue description

Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
I'll have hwops replace it next week. But for now, I'll just make the host forget about the device so it won't run tests on it.

Comment 4 by boliu@chromium.org, Jun 7 2016

Today's round of "bad" devices, more than one..

06c05dc8003bb082

https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82826
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82622
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82559
https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82530

074b21c80059a053

https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82600

06b7f7eb00622734

https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82535


second and third only had one flake each, but pattern looks exactly the same, seemingly random tests all fail on the same device

This is the #1 source of webview test flakes on the cq, so would really be helpful to investigate what's special about these devices, and what is causing them to go bad. I'll try going through the device logs but that usually doesn't end up being very helpful.

Can anyone with physical access to the devices check if anything is out of wack with these devices? Maybe screen is off, or lock screen is on <- completely random guesses not backed up by anhthing
Cc: stip@chromium.org
Ok, this is getting a little disconcerting. I swing by the lab later today and take a look.
Pulled 06c05dc8003bb082 back with me. I'll play around with it/see if I can't repro locally.

In the meantime, I'll reflash/wipe the other ones and see if that helps.
Reflashed 06b82c6d00622f5c, 038529e4003bfda5, 074b21c80059a053, 06b7f7eb00622734 to KTU48P. Let's see how long that helps.

Looking at 06c05dc8003bb082, which I brought back with me, nothing seems to be amiss. When I get a chance, I'll rerun the tests locally with an eye on logs and see how things go.
 Issue 618307  has been merged into this issue.
Cc: jbudorick@chromium.org
Status: Started (was: Assigned)
Started running android_webview_test_apk on one of these flaky phones. Here's hoping I find something useful.

Comment 10 by boliu@chromium.org, Jun 11 2016

Haven't seen issues like this in a few days.
 Issue 618795  has been merged into this issue.
 Issue 618781  has been merged into this issue.
Labels: -Pri-3 Pri-1
Just merged two other issues into this. It's still happening; the try flakes app caught this this time, it looks like.
Checked all our bots. This is what I came up with:

https://chromium-swarm.appspot.com/restricted/bot/build9-b4 : was quarantined for 10+ days for device funkiness. Fixed with https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=cac32fb1fdf72c784b69a1b931f097af0e31135e
https://chromium-swarm.appspot.com/restricted/bot/build71-b4 : was offline for 3+ days. Fixed with a host reboot.
https://chromium-swarm.appspot.com/restricted/bot/build20-b4 : hasn't picked up a test in 3+ days due to only having 3 good devices. Needs manual intervention.

https://chromium-swarm.appspot.com/restricted/bot/build1-b4 : failing lots of tests, will investigate
https://chromium-swarm.appspot.com/restricted/bot/build2-b4 : failing more tests than it should, will investigate
https://chromium-swarm.appspot.com/restricted/bot/build66-b4 : failing more tests than it should, will investigate
https://chromium-swarm.appspot.com/restricted/bot/build69-b4 : failing lots of tests, will investigate


Every other bot seems fine.
Cc: bpastene@chromium.org
 Issue 620997  has been merged into this issue.
Summary: linux_android_rel_ng runs at maximum swarming capacity during peak times and causes builds to fail (was: Remove "bad" device from android swarm)
Project Member

Comment 17 by chromium...@appspot.gserviceaccount.com, Jun 18 2016

Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
Latest resuscitations:
https://chromium-swarm.appspot.com/restricted/bot/build91-b4 : offline for over a day, not sure why; logs had entries from only a few hours ago, rebooted
https://chromium-swarm.appspot.com/restricted/bot/build71-b4 : offline for over a day, same reason as previous check: USBErrorNotFound: LIBUSB_ERROR_NOT_FOUND, will have someone from hw-ops take a look at the usb cables, rebooted in the meantime
https://chromium-swarm.appspot.com/restricted/bot/build82-b4 : funky adb error, was quarantined for 12+ hours, rebooted
https://chromium-swarm.appspot.com/restricted/bot/build83-b4 : over heated, and has been for nearly 24 hours, will need to find out why

Also 2 quarantined bots due to a race condition in bot code. Will be fixed with https://chromereviews.googleplex.com/453087013

Also also, a bunch of bullhead bots quarantined for several days due to the assertion failure. Will need to resolve.
https://github.com/luci/python-adb/blob/master/adb/contrib/adb_commands_safe.py#L446
Here's todays:
https://chromium-swarm.appspot.com/restricted/bot/build40-b4 : offline for 13 hours due to USBErrorNotFound: LIBUSB_ERROR_NOT_FOUND, I need to ensure that errors stops taking bots offline
https://chromium-swarm.appspot.com/restricted/bot/build62-b4 : offline for over an hour due to assertion failure at https://github.com/luci/python-adb/blob/master/adb/contrib/adb_commands_safe.py#L446, also need to fix that
Status: Assigned (was: Started)
Status: Started (was: Assigned)

Comment 22 by stip@chromium.org, Jul 6 2016

More adb_commands_safe assertions taking down the bot today.
Project Member

Comment 23 by chromium...@appspot.gserviceaccount.com, Jul 7 2016

Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
All recent failures on a specific device of build79-b4. I'll hit it up with a reflash.
Project Member

Comment 25 by chromium...@appspot.gserviceaccount.com, Jul 8 2016

Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.

Comment 26 by stip@chromium.org, Sep 29 2016

Status: Fixed (was: Started)
We haven't seen this issue after they've been flashed, so let's close this for now.

Sign in to add a comment