linux_android_rel_ng runs at maximum swarming capacity during peak times and causes builds to fail |
||||||||||
Issue descriptionFrom https://bugs.chromium.org/p/chromium/issues/detail?id=604568#c45 #1 we definitely have a bad device: 06b82c6d00622f5c. All failed tests on these 4 try runs ran on that device, and there is no discernable pattern in the test failures: https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/81338 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/81380 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/81460 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/81751 Can we remove that device from cq, and maybe try to find out why it's so bad?
,
Jun 4 2016
I'll have hwops replace it next week. But for now, I'll just make the host forget about the device so it won't run tests on it.
,
Jun 6 2016
Can we take this one off too: 038529e4003bfda5. Same symptom as before, bunch of unrelated tests all failed on the same device: https://build.chromium.org/p/chromium.linux/builders/Android%20Tests/builds/27023 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82256 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82205 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82083 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82011 Still very interested in knowing why these devices are "bad"..
,
Jun 7 2016
Today's round of "bad" devices, more than one.. 06c05dc8003bb082 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82826 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82622 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82559 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82530 074b21c80059a053 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82600 06b7f7eb00622734 https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/82535 second and third only had one flake each, but pattern looks exactly the same, seemingly random tests all fail on the same device This is the #1 source of webview test flakes on the cq, so would really be helpful to investigate what's special about these devices, and what is causing them to go bad. I'll try going through the device logs but that usually doesn't end up being very helpful. Can anyone with physical access to the devices check if anything is out of wack with these devices? Maybe screen is off, or lock screen is on <- completely random guesses not backed up by anhthing
,
Jun 7 2016
Ok, this is getting a little disconcerting. I swing by the lab later today and take a look.
,
Jun 7 2016
Pulled 06c05dc8003bb082 back with me. I'll play around with it/see if I can't repro locally. In the meantime, I'll reflash/wipe the other ones and see if that helps.
,
Jun 8 2016
Reflashed 06b82c6d00622f5c, 038529e4003bfda5, 074b21c80059a053, 06b7f7eb00622734 to KTU48P. Let's see how long that helps. Looking at 06c05dc8003bb082, which I brought back with me, nothing seems to be amiss. When I get a chance, I'll rerun the tests locally with an eye on logs and see how things go.
,
Jun 8 2016
Issue 618307 has been merged into this issue.
,
Jun 9 2016
Started running android_webview_test_apk on one of these flaky phones. Here's hoping I find something useful.
,
Jun 11 2016
Haven't seen issues like this in a few days.
,
Jun 13 2016
Issue 618795 has been merged into this issue.
,
Jun 13 2016
Issue 618781 has been merged into this issue.
,
Jun 13 2016
Just merged two other issues into this. It's still happening; the try flakes app caught this this time, it looks like.
,
Jun 13 2016
Checked all our bots. This is what I came up with: https://chromium-swarm.appspot.com/restricted/bot/build9-b4 : was quarantined for 10+ days for device funkiness. Fixed with https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=cac32fb1fdf72c784b69a1b931f097af0e31135e https://chromium-swarm.appspot.com/restricted/bot/build71-b4 : was offline for 3+ days. Fixed with a host reboot. https://chromium-swarm.appspot.com/restricted/bot/build20-b4 : hasn't picked up a test in 3+ days due to only having 3 good devices. Needs manual intervention. https://chromium-swarm.appspot.com/restricted/bot/build1-b4 : failing lots of tests, will investigate https://chromium-swarm.appspot.com/restricted/bot/build2-b4 : failing more tests than it should, will investigate https://chromium-swarm.appspot.com/restricted/bot/build66-b4 : failing more tests than it should, will investigate https://chromium-swarm.appspot.com/restricted/bot/build69-b4 : failing lots of tests, will investigate Every other bot seems fine.
,
Jun 17 2016
,
Jun 17 2016
,
Jun 18 2016
Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
Jun 21 2016
Latest resuscitations: https://chromium-swarm.appspot.com/restricted/bot/build91-b4 : offline for over a day, not sure why; logs had entries from only a few hours ago, rebooted https://chromium-swarm.appspot.com/restricted/bot/build71-b4 : offline for over a day, same reason as previous check: USBErrorNotFound: LIBUSB_ERROR_NOT_FOUND, will have someone from hw-ops take a look at the usb cables, rebooted in the meantime https://chromium-swarm.appspot.com/restricted/bot/build82-b4 : funky adb error, was quarantined for 12+ hours, rebooted https://chromium-swarm.appspot.com/restricted/bot/build83-b4 : over heated, and has been for nearly 24 hours, will need to find out why Also 2 quarantined bots due to a race condition in bot code. Will be fixed with https://chromereviews.googleplex.com/453087013 Also also, a bunch of bullhead bots quarantined for several days due to the assertion failure. Will need to resolve. https://github.com/luci/python-adb/blob/master/adb/contrib/adb_commands_safe.py#L446
,
Jun 23 2016
Here's todays: https://chromium-swarm.appspot.com/restricted/bot/build40-b4 : offline for 13 hours due to USBErrorNotFound: LIBUSB_ERROR_NOT_FOUND, I need to ensure that errors stops taking bots offline https://chromium-swarm.appspot.com/restricted/bot/build62-b4 : offline for over an hour due to assertion failure at https://github.com/luci/python-adb/blob/master/adb/contrib/adb_commands_safe.py#L446, also need to fix that
,
Jun 27 2016
,
Jun 27 2016
,
Jul 6 2016
More adb_commands_safe assertions taking down the bot today.
,
Jul 7 2016
Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
Jul 7 2016
All recent failures on a specific device of build79-b4. I'll hit it up with a reflash.
,
Jul 8 2016
Detected 3 new flakes for test/step "components_browsertests (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiRjb21wb25lbnRzX2Jyb3dzZXJ0ZXN0cyAod2l0aCBwYXRjaCkM. This message was posted automatically by the chromium-try-flakes app.
,
Sep 29 2016
We haven't seen this issue after they've been flashed, so let's close this for now. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by jbudorick@chromium.org
, Jun 3 2016