New issue
Advanced search Search tips

Issue 794372 link

Starred by 6 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

linux asan tests appear to be significantly slower than regular tests

Project Member Reported by dpranke@chromium.org, Dec 13 2017

Issue description

migrating from  bug 793993  ...

Some tests run under ASan/LSan on the waterfalls appear to be *significantly* slower than the tests run under a regular release build.

Compare, for example:

https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests/65487
https://ci.chromium.org/buildbot/chromium.memory/Linux%20ASan%20LSan%20Tests%20%281%29/40807

blink_heap_unittests takes 5s on release, 37s under Asan, a 7x slowdown:

https://chromium-swarm.appspot.com/user/task/3a60c4d6b2f11f10
https://chromium-swarm.appspot.com/user/task/3a60df2818889e10

net_unittests goes from 8m to 121m (split across 4 shards), a 15x slowdown.

webkit_unit_tests goes from 30s to 2460s, an 80x slowdown:

https://chromium-swarm.appspot.com/user/task/3a60c5666b162910
https://chromium-swarm.appspot.com/user/task/3a60df87d19cde10

Here's a plx query to show the breakdowns for all of the tests:

SELECT tags_master as master, 
        tags_buildername as builder,
        tags_stepname as stepname, 
        sum(completed_ts - started_ts) as dur,        
        sum(cost_usd) as cost,
        (sum(cost_usd) / count (distinct tags_build_id)) as cost_per_build
 FROM 
   FLATTEN(FLATTEN(FLATTEN(FLATTEN(FLATTEN(chrome_infra.swarming_tasks.yesterday, tags_project), 
                                           tags_master), 
                                   tags_buildername), 
                           tags_stepname),
           tags_build_id)
WHERE state = 'COMPLETED'
  and tags_project = 'chromium'
  and ((tags_master = 'chromium.linux' and tags_buildername = 'Linux Tests' and tags_build_id = '65487') or 
       (tags_master = 'chromium.memory' and tags_buildername = 'Linux ASan LSan Tests (1)' and tags_build_id = '40806'))
  and completed_ts > (PARSE_UTC_USEC('2017-12-11') / 1000000)
  and completed_ts < (PARSE_UTC_USEC('2017-12-11') / 1000000) + 86400
GROUP BY master, builder, stepname
ORDER BY master asc, builder asc, stepname asc

with the results attached

kcc@, can you help us find people to dig into what's going on here? Is it possible the bots are overly resource constrained and thrashing, or something? We are sorely tempted to turn off some of the worst offenders here, because the slowdown simply isn't acceptable even though we catch a lot of bugs w/ ASan and LSan.

But, I'm optimistic we can figure out what's going on fairly easily.
 
linux_v_asan_test_times.csv
8.0 KB View Download
FWIW, when I looked at this, it didn't seem like this is a recent explosion in time. I saw graphs which slowly crept up, but were always 10X plus slower than on other builders.
Potentially related to bug 791698
#0: I glanced at the resource metrics from a bot graph from viceroy during a run of browser_tests on ASAN and didn't notice anything too thrashy. IIRC, memory usage was <50%. (CPU usage was high but not 100%.)

Admittedly anecdata.
I believe I saw significant slowdowns for things like net_unittests back in May, though I might be getting that confused w/ MSan. I don't recall the 80x differences, though.

Comment 5 by kcc@google.com, Dec 13 2017

Is this something I can reproduce locally? 
If not, I'm afraid I don't want to own it :) 
I did see at least 8x slowdowns locally, so at least some of it. 

You should be able to easily build locally and test it on the bots using swarming, as well. We can help if need be.

Comment 7 by kcc@google.com, Dec 13 2017

> You should be able to easily build locally

That I can do. 
What is the exact build config? 
is_asan = true is_debug = false ? 

> test it on the bots using swarming

Sorry, I won't go there (and I have no one on the team to help, sorry again)

Comment 8 by mmoroz@chromium.org, Dec 13 2017

Kostya, I can take a look if it requires any special Chrome-related actions (but can also ask questions regarding LLVM side involved here :) ).

Comment 9 by kcc@google.com, Dec 13 2017

This is what I've done in a fresh chromium checkout: 
gn gen out/opt '--args=is_debug=false' --check
gn gen out/asan '--args=is_asan=true is_debug=false' --check
ninja -C out/asan net_unittests 
ninja -C out/opt net_unittests 

for f in asan opt ; do ./out/$f/net_unittests  --single-process-tests > $f.log 2>&1; done 

asan passes in 10m: 
[==========] 24028 tests from 838 test cases ran. (608418 ms total)

opt got stuck: 
[ RUN      ] ProxyConfigServiceLinuxTest.KDEFileChanged
[221565:222383:1213/073132.067866:2475260036086:ERROR:proxy_config_service_linux.cc(1277)] Unable to set up proxy configuration change notifications
opt.log lines 10203-10257/10257 (END)
<hangs>

So, single-process net_unittests runs for 10 minutes, not even close to 121m mentioned above. 

I failed to build blink_heap_unittests and webkit_unit_tests

../../ui/accessibility/ax_node_data.h:17:10: fatal error: 'ui/accessibility/ax_enums.h' file not found
#include "ui/accessibility/ax_enums.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Please advise on a local experiment that can demonstrate the slowdown. 
If the slowdown reproduces only on the bots, then it's most likely an infra problem 
(wrong build flags, not enough RAM, ...) and I am not ready to own it

Comment 10 by kcc@google.com, Dec 13 2017

base_unittests runs ~ the same time with and w/o asan (11 vs 13 seconds)
I think machine configuration (or the environment) must be part of the problem. 

It's relatively easy to download the binaries used to run a given task and run it locally, and also to build locally and then trigger the task under swarming, so that should allow us to do some better troubleshooting even if we can't repo the extreme slowdowns locally. I'll post more instructions later this morning.

Comment 12 by kcc@chromium.org, Dec 19 2017

Cc: kcc@chromium.org
Owner: ----
Un-owning the bug to reflect the fact that I am not working on it. 
If/when there is a local reproducer feel free to assign it back.
Again, I am sorry, but I don't have capacity to debug problems on unfamiliar infra (i.e. other than locally) 
Owner: dpranke@chromium.org
Status: Assigned (was: Untriaged)
@kcc - no problem. I hadn't updated the bug because I was actually working on things that would make it easier for you to reproduce issues. That work has landed, and so I'm going to see if I can come up w/ a repro case and will bounce it back if I can.
I looked at this some yesterday and, as I noted over in https://bugs.chromium.org/p/chromium/issues/detail?id=736521#c17, am somewhat suspicious of a couple of runtime flags, --test-launcher-batch-limit=1 and --test-launcher-print-test-stdio=always. I'm planning to move them from the deep dark recipe hole from which they currently get added to the tests over in to the src-side spec, and then we can look at removing them.
Project Member

Comment 15 by bugdroid1@chromium.org, Jan 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/edfe7f871492931def83874a0f4023df7232dc58

commit edfe7f871492931def83874a0f4023df7232dc58
Author: John Budorick <jbudorick@chromium.org>
Date: Tue Jan 23 15:27:22 2018

chromium.memory: Surface test launcher args used by the Linux ASAN bot.

This also adds the ability to specify args on a per-bot basis
in waterfalls.pyl.

Bug: 736521, 794372
Change-Id: I83af8884fccbe3937e4a46773389b4a0aebf2267
Reviewed-on: https://chromium-review.googlesource.com/876531
Commit-Queue: John Budorick <jbudorick@chromium.org>
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#531240}
[modify] https://crrev.com/edfe7f871492931def83874a0f4023df7232dc58/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/edfe7f871492931def83874a0f4023df7232dc58/testing/buildbot/generate_buildbot_json.py
[modify] https://crrev.com/edfe7f871492931def83874a0f4023df7232dc58/testing/buildbot/generate_buildbot_json_unittest.py
[modify] https://crrev.com/edfe7f871492931def83874a0f4023df7232dc58/testing/buildbot/waterfalls.pyl

Owner: jbudorick@chromium.org
Status: Started (was: Assigned)
I'm not sure what the state of this is at this point, and I'm probably not the person who is going to work on it further. Punting to jbudorick to own or reassign.
I'm still working on argument relocation & hopefully removal, per #14. Got distracted by trooper things.

Comment 19 by r...@chromium.org, Apr 17 2018

My theory is that ASan has some process startup overhead. Reserving 20TB of shadow memory virtual address space is not free, and perhaps it is more expensive on GCE VMs.

I only have one data point to support this theory, though, and it is that local net_unittests runs take 519s with the normal parallel test launcher, but they take 713s when I run with --single-process-tests, disabling the parallelism. In other words, using 56-way parallelism on a Z840 only gets a 37% speedup.

Is there a way to make it so the test launcher reuses the same process for more test runs? That would test the theory, and if it is correct, reduce the total amount of work to run these tests. We'd still probably want to shard net_unittests up more.
Project Member

Comment 20 by bugdroid1@chromium.org, Apr 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6b83624e92eb2229a678a087fffce57d939248e4

commit 6b83624e92eb2229a678a087fffce57d939248e4
Author: Reid Kleckner <rnk@google.com>
Date: Wed Apr 18 17:37:33 2018

Shard net_unittests on ToTLinuxASan 16 ways to match memory bot

BUG=chromium:794372
R=thakis@chromium.org
NOTRY=True

Change-Id: Iaef5d1b0e4fcb155d90de714187e16496fdfdee1
Reviewed-on: https://chromium-review.googlesource.com/1015847
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#551732}
[modify] https://crrev.com/6b83624e92eb2229a678a087fffce57d939248e4/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/6b83624e92eb2229a678a087fffce57d939248e4/testing/buildbot/test_suite_exceptions.pyl

Issue 814491 has been merged into this issue.
Project Member

Comment 22 by bugdroid1@chromium.org, Jun 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ef47d5f79c6dcbc52f8cf29ead1491f146bc827

commit 4ef47d5f79c6dcbc52f8cf29ead1491f146bc827
Author: Takuto Ikuta <tikuta@chromium.org>
Date: Mon Jun 11 13:15:36 2018

Increase swarming shards for some asan tests

viz_browser_tests, viz_content_browsertests and content_browsertest take more than 20 minutes on linux asan bot.
This patch increases the shard for such slow tests.
* viz_browser_tests, 10 -> 20: max shard duration become 14 mins from > 25 mins
* viz_content_browsertests, 2 -> 8: max shard duration become 15 mins from > 50 mins
* content_browsertests, 4 -> 8: max shard duration become 16 mins from > 25 mins

linux_chromium_asan_rel_ng has the slowest average CQ cycle time, so reducing test execution time is effective for overall CQ cycle time.
1 week stats of average cycle time of each try builder: http://shortn/_bEvlzGZaH9


Recent history of tests on linux_chromium_asan_rel_ng:
* viz_browser_tests
https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1528691160000&f=state%3ACOMPLETED_SUCCESS&f=name-tag%3Aviz_browser_tests&f=buildername-tag%3Alinux_chromium_asan_rel_ng&l=50&s=duration%3Adesc&st=1528604760000
* viz_content_browsertests
https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1528691160000&f=state%3ACOMPLETED_SUCCESS&f=buildername-tag%3Alinux_chromium_asan_rel_ng&f=name-tag%3Aviz_content_browsertests&l=50&s=duration%3Adesc&st=1528604760000
* content_browsertests
https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1528691160000&f=state%3ACOMPLETED_SUCCESS&f=buildername-tag%3Alinux_chromium_asan_rel_ng&f=name-tag%3Acontent_browsertests&l=50&s=duration%3Adesc&st=1528604760000

Bug: 794372
Change-Id: Ic5a4daae81890e702b9090055ed30e65bea92231
Reviewed-on: https://chromium-review.googlesource.com/1094816
Reviewed-by: Nico Weber <thakis@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#565984}
[modify] https://crrev.com/4ef47d5f79c6dcbc52f8cf29ead1491f146bc827/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/4ef47d5f79c6dcbc52f8cf29ead1491f146bc827/testing/buildbot/test_suite_exceptions.pyl

Owner: tikuta@chromium.org
Status: Assigned (was: Started)
I noticed that init for googletest consumes much cpu resources in asan net_unittests.

From perf profiled result, testing::internal::UnitTestImpl::GetTestCase took nearly 40% of cpu resources.
https://cs.chromium.org/chromium/src/third_party/mesa/src/src/gtest/src/gtest.cc?l=4096&rcl=9d9b0710470f581cb5485b02b6acd8415cc093e8

The actual slowness looks come from asan specific strcmp function used inside std::find_if.
ref: https://github.com/llvm-project/llvm-project-20170507/commit/8df65c1de2e1f82d828b7e8314ddcfaabacb94b9

So I tried to change gtest not using strcmp there.
https://chromium-review.googlesource.com/c/chromium/src/+/1096591

In my tries, max shard duration of net_unittest become
18:40 in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/32387
18:34 in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/32445


This is roughly 1.5x faster than usual duration of net_unittests.
e.g.
28:08 in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/32409
29:11 in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/32406
27:33 in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/32252

Sadly, the change looks not reduce the time of other tests.

I will send PR to googletest.
But I hope ASAN compile does more optimization for injected functions.

Made PR.
https://github.com/google/googletest/pull/1627

This will make net_unittest 2x faster on asan builder.

Nice!

Comment 26 by r...@chromium.org, Jun 12 2018

Cool! The way GTest registers all tests with dynamic initialization is really inefficient. It caused huge startup time problems for us back when we ran valgrind and dr. memory (Google internal bug about it: b/6304344).

It might be worth it to push some kind of static section concatenation registration scheme upstream. It would probably improve startup time for all chrome test binaries, both with ASan and without. It might be worth it...
I don't understand why https://github.com/google/googletest/pull/1627/files helps. Is std::find_if with forward iterators transformed to a call to strcmp() but with reverse iterators isn't? And instrumented strcmp() is much slower than a regular search loop with asan? If so, maybe we shouldn't do the strcmp transform in llvm when asan is enabled? Or is there a different reason why that patch helps?
In net_unittests, there are nearly 40k tests.

And GetTestCase is called from AddTestInfo
https://cs.chromium.org/chromium/src/third_party/googletest/src/googletest/src/gtest-internal-inl.h?l=657&rcl=9077ec7efe5b652468ab051e93c67589d5cb8f85

AddTestInfo is called from MakeAndRegisterTestInfo
https://cs.chromium.org/chromium/src/third_party/googletest/src/googletest/src/gtest.cc?l=2573&rcl=9077ec7efe5b652468ab051e93c67589d5cb8f85

And MakeAndRegisterTestInfo is called for each test with testcase name to initialize global static variable.
That means GetTestCase is almost always called with the test_case_name with previous call.
So the TestCase should be the last element of test_case_ or not found.

If we use forward iterator, we always need to apply strcmp for all elements in test_case_, but when we use reverse iterator, we find the testcase with a strcmp or failed to find. Not found case does not happen frequently.
> And instrumented strcmp() is much slower than a regular search loop with asan?

Maybe.

> If so, maybe we shouldn't do the strcmp transform in llvm when asan is enabled? Or is there a different reason why that patch helps?

Not sure, basically if we use std::string, comparison is pruned by size check before strcmp.
If a program uses strcmp heavily, it might be better to find the way to optimize that (like gtest).
Project Member

Comment 30 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a8353999c7b25aa2d09c6b4dfe3a91347637b4ce

commit a8353999c7b25aa2d09c6b4dfe3a91347637b4ce
Author: Takuto Ikuta <tikuta@chromium.org>
Date: Thu Jun 14 13:02:46 2018

Roll src/third_party/googletest/src/ 9077ec7ef..ce468a17c (5 commits)

Reduces the max shard duration of net_unittest on the asan/tsan builders by about 50%.

https://chromium.googlesource.com/external/github.com/google/googletest.git/+log/9077ec7efe5b..ce468a17c434

$ git log 9077ec7ef..ce468a17c --date=short --no-merges --format='%ad %ae %s'
2018-06-13 misterg Docs sync/internal
2018-06-13 misterg Doc sync/internal
2018-06-12 tikuta Reduce the number of strcmp calling while initialization
2018-06-11 misterg Sync with internal docs
2018-06-11 misterg Sync with internal docs

Created with:
  roll-dep src/third_party/googletest/src
R=dpranke@chromium.org,thakis@chromium.org
BUG=794372

Change-Id: I704490e983697784fcc73c6fa7462bfb35a0694e
Reviewed-on: https://chromium-review.googlesource.com/1100670
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#567237}
[modify] https://crrev.com/a8353999c7b25aa2d09c6b4dfe3a91347637b4ce/DEPS

Comparing:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ASan%20LSan%20Tests%20%281%29/46826 (a few days before tikuta's roll)
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ASan%20LSan%20Tests%20%281%29/47025 (a few days after)

net_unittests is now about 3x as fast. Before, 16 shards at about 1500 s each, for unsharded 400 minutes / 6.7 hours total runtime. Now, 16 shards at about 500 s each, for about 133 min / 2.2 h total runtime.

Meanwhile, on https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20ASan%20LSan%20Tests%20%281%29/47025 without asan, net_unittests runs on a single shard in a bit under 4 minutes.

So that helped a lot, but comment 0 had a 15x slowdown from asan for net_unittests, while that change go us from a 100x slowdown to merely a 33x slowdown -- it's better than before, but less good than where we were when this bug got filed :-/
Yeah, googletest initialization yet takes time on the tests, 42% out of 52% cpu consumption.

Not sure why the situation become bad, I guess there are some changes in sanitizer makes things worse.
perf net_unittests.png
145 KB View Download
https://plx.corp.google.com/scripts2/script_5b._2451af_0000_29d7_8f88_94eb2c14e7be

Hmm, there seems to be gradual increase from end of January to the end of March.

Owner: ----
Status: Available (was: Assigned)
I did further optimization for googletest.

More I removed memory allocation from net_unittests, more the execution speed is improved.
This patch run around 2 times faster.
https://chromium-review.googlesource.com/c/chromium/src/+/1112880/22


But why memory allocation in asan becomes such a slow on this test?
In net_unittests, many of memory allocation seems come from parameterized test.
Allocation pattern of parameterized test is very bad for asan allocator?

When I see perf's result of net_unittests on master, I see large time consumption in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put.
And hotspot seems around https://github.com/llvm-project/llvm-project-20170507/blob/135c4a0f43501987991610a289cf36758dbc5f8a/compiler-rt/lib/sanitizer_common/sanitizer_stackdepotbase.h#L105

But not sure whether it can be optimized further.
current.png
294 KB View Download
tikuta, were you able to repro this locally? What was your local setup? kcc said he'd investigate if he gets explicit steps on how to repro locally.
Cc: tikuta@chromium.org
#35, I forget to add myself to cc and adding star.

Below instruction can be used.

I used following args.gn to build net_unittests.
```
dcheck_always_on = true
is_asan = true
is_component_build = false
is_debug = false
is_lsan = true
strip_absolute_paths_from_debug_symbols = true
symbol_level = 1
use_goma = true
```

And take perf stats with following script.

```
#!/bin/bash
set -x

# kill net_unittests after some timeout, some tests in net_unittests on linux corp machine seems to become very slow due to corp specific damon.
time perf record --call-graph lbr timeout 10 ./net_unittests --test-launcher-batch-limit=1  --test-launcher-print-test-stdio=always
# remove temporal files
rm -rf /tmp/.org.chromium* 
```


Also I noticed that is_asan is not only the config causing this slowness.
dcheck_always_on adds some large cpu usage to some tests.

When I disabled dcheck, max shard duration of browser_tests reduced from around 8 mins to 5 mins.
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/44567

And most of browser_tests slowness seems to come from around
https://cs.chromium.org/chromium/src/base/memory/weak_ptr.cc?l=44&rcl=fb40db1204f14ad0e420652787c4a70ce374c4a4
browser_tests.png
299 KB View Download
So I'm considering to disable dcheck_always_on in CQ linux_chromium_asan_rel_ng bots.

I think enabling DCHECK only in waterfall asan bot is reasonable.
Because browser_tests from linux_chromium_asan_rel_ng is largest resource consumer in swarming task.

https://screenshot.googleplex.com/twWBBGp11x6
Taken from below query
SELECT buildername, stepname, COUNT(1) AS cnt, SUM(started_ts - created_ts) AS pending_sum_s, SUM(completed_ts - started_ts) AS sum_s, AVG(completed_ts - started_ts) AS avg_s, MAX(completed_ts - started_ts) AS max_s
FROM chrome_infra.swarming_tasks.last7days, UNNEST(tags_stepname) as stepname, UNNEST(tags_buildername) as buildername, UNNEST(tags_master) as mastername, UNNEST(tags_patch_project) as patch_project
WHERE
  completed_ts is not null
  AND started_ts is not null
GROUP BY buildername, stepname
ORDER BY sum_s DESC
LIMIT 100;

Waterfall bots and its matching cq bot must share a config (for good reasons), so we can't have the cq bot not use dchecks without the main waterfall bot losing them too. But if it saves lots of resources, then disabling in both places is probably fine since the non-asan bots have dchecks enabled. Sounds like that's less than a 50% speedup, and asan/no-dchecks is still 10x slower than no-asan/dchecks.

kcc, can you check why asan builds are so much slower here with the steps from comment 36?
#38, for dcheck, we don't use the same config between CQ and waterfall to detect miss usage of DCHECK now.
See https://groups.google.com/a/google.com/d/msg/chrome-client-infra/wny0cZELz6s/w032I7D7BwAJ

Here, I don't care whether dcheck is enabled or not if we can disable dcheck on CQ bot.
Huh, I thought that was impossible. Thanks for teaching me :-)

Anyways, disabling dchecks on the asan bot sounds fine to me, but it seems a bit like a workaround. The root issue is that asan is much slower than advertised here for some reason.
Project Member

Comment 41 by bugdroid1@chromium.org, Jul 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/54671d97894b47be18a43c542538cdac934cf40c

commit 54671d97894b47be18a43c542538cdac934cf40c
Author: Takuto Ikuta <tikuta@chromium.org>
Date: Thu Jul 19 19:06:35 2018

Increase the number of shards for some tests

This is to make CQ check faster when update clang like below.
https://chromium-review.googlesource.com/c/chromium/src/+/1143094

This CL increases the number of shards so that max shard duration reduced around 30 minutes.

I use shard duration of following builds to adjust the number of shards.
* https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_chromeos_asan_rel_ng/15713
* https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_chromeos_msan_rel_ng/831

Bug: 794372, 865455
Change-Id: Iee4e55af62219ba6d4f36d3ff5bcf53e5b7e9725
Reviewed-on: https://chromium-review.googlesource.com/1143435
Commit-Queue: Takuto Ikuta <tikuta@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576584}
[modify] https://crrev.com/54671d97894b47be18a43c542538cdac934cf40c/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/54671d97894b47be18a43c542538cdac934cf40c/testing/buildbot/test_suite_exceptions.pyl

Project Member

Comment 42 by bugdroid1@chromium.org, Jul 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/412ad67c04bb591f9f9f7dafaeefe42b2c7201b7

commit 412ad67c04bb591f9f9f7dafaeefe42b2c7201b7
Author: Scott Violet <sky@chromium.org>
Date: Fri Jul 27 20:54:06 2018

chromeos: increase number of shards for mash_browsertests

The number of shards for browser_tests on
"Linux Chromium OS ASan LSan Tests (1)" was increased to 30 a while back,
but not mash_browser_tests. This means mash_browser_tests often timeout
on the bot. I'm upping the number of shards to be the same for the two.

BUG=794372
TEST=none

Change-Id: I24dd03bb0c61a65d34dfb290af1269026f9563a3
Reviewed-on: https://chromium-review.googlesource.com/1153549
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#578768}
[modify] https://crrev.com/412ad67c04bb591f9f9f7dafaeefe42b2c7201b7/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/412ad67c04bb591f9f9f7dafaeefe42b2c7201b7/testing/buildbot/test_suite_exceptions.pyl

Project Member

Comment 43 by bugdroid1@chromium.org, Jul 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/98759ece03864c3241a7d46225c4ee98e289d2ce

commit 98759ece03864c3241a7d46225c4ee98e289d2ce
Author: James Cook <jamescook@chromium.org>
Date: Mon Jul 30 22:02:13 2018

chromeos: increase shards count for LSAN viz_browser_tests

The number of shards for browser_tests on
"Linux Chromium OS ASan LSan Tests (1)" was increased to 30 a while back,
but not viz_browser_tests. This means viz_browser_tests often timeout
on the bot. I'm upping the number of shards to be the same for the two.

(sky@ recently did something similar for mash_browser_tests)

BUG=794372
TEST=none

Change-Id: Ibbdb086bc37e8a4722a0e5ab0eeb6312a8db4741
Reviewed-on: https://chromium-review.googlesource.com/1155634
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: James Cook <jamescook@chromium.org>
Cr-Commit-Position: refs/heads/master@{#579186}
[modify] https://crrev.com/98759ece03864c3241a7d46225c4ee98e289d2ce/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/98759ece03864c3241a7d46225c4ee98e289d2ce/testing/buildbot/test_suite_exceptions.pyl

Project Member

Comment 44 by bugdroid1@chromium.org, Sep 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b86f625a6dacfbc4588a05d7c312a9e2de26a5e8

commit b86f625a6dacfbc4588a05d7c312a9e2de26a5e8
Author: John Budorick <jbudorick@chromium.org>
Date: Thu Sep 20 21:42:04 2018

Move test-launcher-* args for sanitizer tests src-side.

Bug: 794372
Change-Id: I6b6d81ab0c72fd1b781d03e242f78ddef24fef95
Reviewed-on: https://chromium-review.googlesource.com/1235015
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#592954}
[modify] https://crrev.com/b86f625a6dacfbc4588a05d7c312a9e2de26a5e8/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/b86f625a6dacfbc4588a05d7c312a9e2de26a5e8/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/b86f625a6dacfbc4588a05d7c312a9e2de26a5e8/testing/buildbot/waterfalls.pyl

Project Member

Comment 45 by bugdroid1@chromium.org, Sep 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/e61705ca04e5fceffee9019eae57dc97fc02a17a

commit e61705ca04e5fceffee9019eae57dc97fc02a17a
Author: John Budorick <jbudorick@chromium.org>
Date: Thu Sep 20 23:36:06 2018

chromium: clean up ASAN configs & move test launcher args to src.

Bug: 794372
Change-Id: I049b74c368b393d7b2b2ffc8cc1af13c5d2d280c
Reviewed-on: https://chromium-review.googlesource.com/1236539
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>

[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipes/chromium.expected/msan.json
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium_tests/chromium_memory.py
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium_tests/client_v8_fyi.py
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipes/chromium.expected/dynamic_gtest_memory_asan_no_lsan.json
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium/tests/runtest.expected/msan.json
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium_tests/chromium_lkgr.py
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium/config.py
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipe_modules/chromium/tests/configs.py
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipes/chromium.expected/dynamic_gtest_memory_mac64.json
[modify] https://crrev.com/e61705ca04e5fceffee9019eae57dc97fc02a17a/scripts/slave/recipes/chromium.expected/tsan.json

I finally got around to c#14. The CL removing --test-launcher-batch-size=1 is here: https://chromium-review.googlesource.com/c/chromium/src/+/1237409

Its try run of linux_chromium_asan_rel_ng (https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_asan_rel_ng/103915) has some interesting results:
 - blink_heap_unittests: <10s, compared to a current median ~50s
 - net_unittests: ~60s per shard x 15 shards = 900s, compared to a current median >6000s
 - webkit_unit_tests: ~60s per shard x 5 shards = 300s, compared to a current median >2000s

(current values from http://shortn/_ywjdqcSlxq)
Looks great improvement!

Will you remove --test-launcher-batch-size=1 entirely? Or only remove from asan CQ bots?
I'm bit considering the case that multiple test exeuction affect the behavior each other.
Planning to remove all current uses of the flag from the bots. That is a concern, though I don't think maintaining --test-launcher-batch-size=1 is a good long-term way to handle it.
Project Member

Comment 49 by bugdroid1@chromium.org, Sep 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ff1be83fd7a40f2c96e08d7f448f6907b014294b

commit ff1be83fd7a40f2c96e08d7f448f6907b014294b
Author: John Budorick <jbudorick@chromium.org>
Date: Fri Sep 21 15:28:21 2018

Remove --test-launcher-batch-limit=1 from sanitizer tests.

Bug: 794372
Change-Id: Ic68afc2f66f340cf807406c3446567f886cd005e
Reviewed-on: https://chromium-review.googlesource.com/1237409
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#593192}
[modify] https://crrev.com/ff1be83fd7a40f2c96e08d7f448f6907b014294b/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/ff1be83fd7a40f2c96e08d7f448f6907b014294b/testing/buildbot/chromium.memory.json
[modify] https://crrev.com/ff1be83fd7a40f2c96e08d7f448f6907b014294b/testing/buildbot/waterfalls.pyl

Sign in to add a comment