Redo linux_chromium_asan_rel_ng sharding |
||||
Issue descriptionThis is usually the slowest bot, looks like browser_tests shards take 30 minutes. These should be around 10. I'm not sure why it got so much slower, i.e. did asan get slower?
,
Dec 12 2017
,
Dec 12 2017
Attached: shard percentiles over the entire period for which we have data in chrome_infra.swarming_tasks This only goes back six months, but it looks like a gradual growth in time as opposed to a single regression or a small group of regressions.
,
Dec 12 2017
The asan bot seems to be much slower than other bots. https://viceroy.corp.google.com/chrome_infra/Buildbot/buildbot?refresh=-1&duration=1d&job=master.tryserver.chromium.linux&builder=linux_chromium_asan_rel_ng in the swarming section shows runtimes for several test suites, all of which take much longer on asan than on other configurations. I've confirmed this manually for the top 4 problems (components_unittests, webkit_unit_tests, net_unittests, and content_browsertests), not for browser_tests. browser_tests does take longer on this builder, but it's not as large of a difference. browser_tests is not limiting the builder in most cases, as far as I can tell. I can double check this, but based on the graph in the viceroy console above, the other tests take longer, and slow down the builder. I also made the same graph as john. Seems like it's regressed over time.
,
Dec 12 2017
Can someone take care of changing the sharding so this bot isn't so slow?
,
Dec 12 2017
yes, I'll get to it.
,
Dec 12 2017
Interesting that we were looking at this independently.
Running the following query in dremel to compare the swarming tasks time for a regular linux build and the asan build from the same time:
SELECT tags_master as master,
tags_buildername as builder,
tags_stepname as stepname,
sum(completed_ts - started_ts) as dur,
sum(cost_usd) as cost,
(sum(cost_usd) / count (distinct tags_build_id)) as cost_per_build
FROM
FLATTEN(FLATTEN(FLATTEN(FLATTEN(FLATTEN(chrome_infra.swarming_tasks.yesterday, tags_project),
tags_master),
tags_buildername),
tags_stepname),
tags_build_id)
WHERE state = 'COMPLETED'
and tags_project = 'chromium'
and ((tags_master = 'chromium.linux' and tags_buildername = 'Linux Tests' and tags_build_id = '65487') or
(tags_master = 'chromium.memory' and tags_buildername = 'Linux ASan LSan Tests (1)' and tags_build_id = '40806'))
and completed_ts > (PARSE_UTC_USEC('2017-12-11') / 1000000)
and completed_ts < (PARSE_UTC_USEC('2017-12-11') / 1000000) + 86400
GROUP BY master, builder, stepname
ORDER BY master asc, builder asc, stepname asc
https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests/65487
https://ci.chromium.org/buildbot/chromium.memory/Linux%20ASan%20LSan%20Tests%20%281%29/40807
there are steps that are 20x - 100x slower (or more), e.g. webkit_unit_tests goes from 20s to 40m!
I don't think simply resharding is the right answer (perhaps obviously :). We need to file bugs and possibly disable test steps until we figure out why things are so much slower.
,
Dec 12 2017
I don't think that *solely* resharding is the right answer, but I'd suspect that it will have minimal effect on total runtime while reducing user-visible runtime and as such may be worthwhile as a stopgap. (Disabling suites, or the bot, would do that too, but both are a bit more drastic.)
,
Dec 13 2017
I agree. I'm going to file a separate bug for the overall "figure out what the heck is wrong w/ asan" problem, and we can leave this for the short-term fixes.
,
Dec 13 2017
filed bug 794372 for the larger issue.
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5052d557b432ddf90d80f73a428418550dd7b24f commit 5052d557b432ddf90d80f73a428418550dd7b24f Author: John Budorick <jbudorick@chromium.org> Date: Wed Dec 13 02:59:34 2017 Reshard egregiously long suites on linux_chromium_asan_rel_ng. This reshards the suites w/ 90th percentile task times > 10 minutes on linux_chromium_asan_rel_ng. Bug: 793993 Change-Id: Id269a3a2466af43956e11b00e25637e02ba5f410 Reviewed-on: https://chromium-review.googlesource.com/822183 Commit-Queue: John Budorick <jbudorick@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Cr-Commit-Position: refs/heads/master@{#523668} [modify] https://crrev.com/5052d557b432ddf90d80f73a428418550dd7b24f/testing/buildbot/chromium.memory.json [modify] https://crrev.com/5052d557b432ddf90d80f73a428418550dd7b24f/testing/buildbot/test_suite_exceptions.pyl
,
Dec 13 2017
50th and 90th percentiles of browser_tests shard time by hour
,
Dec 13 2017
50th and 90th percentiles of webkit_unit_tests shard time by hour
,
Dec 13 2017
Per #10, investigation into the underlying cause of unexpected slowness on the ASAN bot will continue over in issue 794372. The immediate intervention here is done, though. |
||||
►
Sign in to add a comment |
||||
Comment 1 by jbudorick@chromium.org
, Dec 12 2017Components: Infra>Client>Chrome
Labels: OS-Android