New issue
Advanced search Search tips

Issue 889181 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 892225
issue 915319



Sign in to add a comment

Reduce false rejects of android-marshmallow-arm64-rel to 0.

Project Member Reported by erikc...@chromium.org, Sep 25

Issue description

In the period from 9/23-9/25, there have only been 2 false rejects:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/92575

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/92559

Both had the same signature:
(1) test shard fails at very beginning due to "NoDevicesError: No devices attached."
(2) retries disabled due to "test results TESTS FAILED; retries without patch disabled (build config changes detected)"

(1) implies that the swarming bot self-quarantine script is not working as well as it could -- no tests were even run by the shard -- it just immediately detected that no devices were available and failed.

(2) shows why 'retry with patch' was never run. In this case, config changes should not prevent 'retry with patch' from running, only 'retry without patch'.
 
(2) makes sense for sure. That might be a bit too special case, but it does make sense.

AFAIK the reason we disable retries without patch when build config changes are detected is because we're afraid of CLs changing the logic that tests use to deapply patches; roughly a security risk. But, retrying with patch should always be fine; for those builds, you'd just directly retry the offending tests.
Yeah, (2) seems fine.

Re (1): note that swarming uses a fork of python-adb that we maintain to communicate with devices, while all of our tasks use regular ol' adb. It's possible that either one can interact with a device while the other can't, or that the handoff between the two gets fumbled on occasion.
Cc: bpastene@chromium.org mar...@chromium.org
+ maruel, bpastene

maruel -- do you have context for c#2? Why do we have a fork of python-adb?
I forked python-adb a long time ago as the upstream version wasn't fault-tolerant, and I needed to support parallel channels. I tried to upstream back a bit but it was too much work for the value, so I gave up.

adb (the executable) was notoriously crappy, even worse than the python implementation, so we decided to use the python driver instead a few years ago.
> "upstream version wasn't fault-tolerant"

Please clarify?

> I needed to support parallel channels

What is this for?


I'm asking because we currently use adb for the tasks themselves, and having this divergence is potentially a cause of inconsistencies in quarantining. 
and parallel channels is the capability of running commands and streaming data simultaneously. Or running two commands simultaneously. It's an internal details.
How strongly would you feel about using regular adb to do the quarantine check from swarming get_state()?
Can you now run adb without starting a daemon? And is it possible to shut it down nicely? That was the big problem before, i.e. it would hang randomly.
> How strongly would you feel about using regular adb to do the quarantine check from swarming get_state()?

I'm all for it. It's been something that I've wanted to do for ages but never had the time. Though, properly making the switch would be a tricky and non-trivial infrastructure change, so I'd like to see a doc or detailed design before it gets implemented.
To be clear, I have zero stake in this, I don't mind.

Other internal adb users would need to be looped it first. I think they would appreciate too.

Anyhow, this is a blocker to switch the code to Go, so that's one additional value out of this.
Blocking: 892225
Labels: Infra-Platform-Test
Blocking: 915319

Sign in to add a comment