New issue
Advanced search Search tips

Issue 860074 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

chromevox_tests occasionally timeouts on chromium.memory/Linux ChromiumOS MSan Tests

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Jul 3

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of spqchan@google.com

chromevox_tests failing on chromium.memory/Linux ChromiumOS MSan Tests

Builders failed on: 
- Linux ChromiumOS MSan Tests: 
  https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests


 
Components: Infra>Platform>Swarming
Summary: chromevox_tests occasionally timeouts on chromium.memory/Linux ChromiumOS MSan Tests (was: chromevox_tests failing on chromium.memory/Linux ChromiumOS MSan Tests)
I checked several failing results:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests/7841
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests/7839
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests/7836
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests/7826

In 7839, shard #0, #1, and #2 timed out, but in the others, only shared #2 timed out.
I'm not familiar with the algorithm to split tests into each shard, but if tests in the same file should run in the same shard, I guess tests randomly time out.

Looking at successful builds (https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ChromiumOS%20MSan%20Tests/7842 for example), shard #2 looks relatively slow, so I suspect that sometimes bots become very slow and shard #2 is likely to time out.

Let me add Swarming component (which is auto-suggested, awesome!).
Components: -Infra>Platform>Swarming Infra>Platform>Swarming>Admin
Labels: Type-Bug
https://chromium-swarm.appspot.com/task?id=3e83309ed8fc0d10 TIMED_OUT;

Ran for 1h, which is the maximum allocated time. It's too slow. You need to make the shard run faster.

In practice, the test looks like it's failing, so fix that.
Labels: -Sheriff-Chromium
Owner: dmazz...@chromium.org
Assigning to dmazzoni@ for further triaging
Owner: mar...@chromium.org
Status: Assigned (was: Available)
It looks like the vast majority of tests are taking 2 - 3 minutes when they succeed.

The same tests run in 4 - 5 seconds on linux-chromeos-rel, 15 - 20 seconds on linux-chromeos-dbg, and 25 - 30 seconds on linux_chromium_chromeos_asan_rel_ng. Those seem like reasonable numbers to me for browser tests.

Some of these tests are literally 1 line. The overhead comes from starting one browser_test and waiting for one extension to load. I sincerely doubt there's a lot of low-hanging fruit to optimize.

MSAN is just slow. The only option I see is to either allocate more time, shard more, or exclude this test suite from MSAN.

M-A, what would you like to do?

Owner: jbudorick@chromium.org
I do not manage capacity, forwarding to John for load assessment.
Cc: dmazz...@chromium.org
MSAN may be slow, but having individual tests take >50x longer under MSAN seems excessive. MSAN doesn't appear to be that slow for comparable suites. Can y'all investigate what's going on here a bit more before we throw more capacity at this?
Cc: -dmazz...@chromium.org jbudorick@chromium.org
Owner: dmazz...@chromium.org
-> dmazzoni to answer #6.
No luck yet, I tried several times to run a local msan build, there are a lot of steps (on Rodete) and it's easy to get wrong. Note that a regular Chrome build is easier, but getting a Chrome OS build has additional issues.

Haven't given up yet.

Sign in to add a comment