Issue metadata
Sign in to add a comment
|
Perf S5 CQ bots take too long |
||||||||||||||||||||||||
Issue descriptionIn case of a timeout failure, it can take up to 10 hours. https://build.chromium.org/p/tryserver.chromium.perf/builders/android_s5_perf_cq/builds/693
,
Nov 15 2016
We're talking about S5s (among other things) later today. Hold off on doing device switches for now.
,
Nov 15 2016
Thanks, John! Ned, do you think if we were able to get more devices + more reliable devices it would definitely be worth keeping these on the CQ? Any idea how many devices we'd need? +robertocn, dtu: do the CQ bots require 1 host per device?
,
Nov 15 2016
From stability point of view, we run our smoke tests on android nexus 5 & nexus 5x on CQ regularly, but not on Samsung device. Given the fact that we are not going to have android_s5_rel_ng anytime soon & the high failure rate we see so far, I don't think it worths it to keep Samsung S5 on CQ_EXTRA_TRYBOT
,
Nov 15 2016
Removing the S5s sgtm.
,
Nov 15 2016
To echo John's comment in #5, we are going to remove the S5s from the perf waterfall until all the nexus devices we have there are more stable, and then re-evaluate adding more non-nexus devices. So we should replace the S5 CQ bots we have with a more stable device, and get some redundancy. Does N5X seem like a good pick to everyone? Ned, how many? Roberto, do we need 1:1 host:device or could we have multiple devices used at the same time on one host?
,
Nov 15 2016
In the short term, I would say a single configuration of 1 host + 7 devices is more than enough. These only get triggered when people create new benchmarks/change benchmarks which is not super often. In the longer run: once we have swarming everywhere, I would advocate using the same swarming pool we use for bisect.
,
Nov 16 2016
Roberto, Dave, can we run CQ jobs in parallel if a bot has multiple devices? I don't think we can, which would mean we'd need to set up multiple hosts. If so, Ned, what is the minimum redundancy you think we need?
,
Nov 16 2016
Annie: I would find the number of time many people happen to change a benchmark at a same time. "git log --since=9/16/2016 --oneline -- tools/perf/benchmarks/ | wc -l" shows that we have 56 commits to the benchmark/ folder in the last 2 months. So that means 1 commit to benchmarks/ folder per day. Assuming 1 host + 7 devices allows us to run any benchmark module in a reasonable of time [1], I think 2 hosts is good enough. [1] is only whole true for benchmark module that do not contains too many benchmarks. smoothness.py module for example, contains 29 smoothness benchmarks and takes a lot of time to run all of them & increase the failure rate. Team should either consider (1) a better way of inferring which benchmarks should be smoke test upon a change or (2) splitting smoothness.py to multiple files.
,
Nov 16 2016
Emily is going to work on this.
,
Nov 22 2016
,
Nov 22 2016
,
Nov 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d5dd88f528645f6413aab414894d3eada2135614 commit d5dd88f528645f6413aab414894d3eada2135614 Author: charliea <charliea@chromium.org> Date: Tue Nov 22 17:33:12 2016 Remove android_s5_perf_cq from tools/perf presubmits eyaich@ is currently working on a more thorough dismantling of the Android S5 perfbots. In the meanwhile, we can stop requiring tools/perf changes to run through the (flaky) CQ. BUG= 665529 Review-Url: https://codereview.chromium.org/2520353003 Cr-Commit-Position: refs/heads/master@{#433904} [modify] https://crrev.com/d5dd88f528645f6413aab414894d3eada2135614/tools/perf/PRESUBMIT.py
,
Dec 1 2016
Emily is getting pretty swamped before her leave. Simon or Dave, would one of you be able to take a look at setting up more stable CQ bots? We should also talk about how this would work in pinpoint.
,
Dec 2 2016
Is the next step of this bug about removing samsung s5 trybot?
,
Dec 2 2016
My understanding is that Emily removed the samsung s5 trybot and the next steps are: * To add a N5X one, ideally with some redundancy so that we can have multiple jobs in CQ at once * To refactor the code to only run a maximum number of benchmarks so that a large refactor doesn't need to run for hours. * Consider switching the desktop CQ bots to VMs to get more parallelism as well
,
Dec 2 2016
So no I never got started on this bug, but I just got out a CL to actually remove them from master.tryserver.chromium.perf: https://chromium-review.googlesource.com/c/415521/ After that a restart will take it off the waterfall and then I will file a ticket with labs to actually remove it from the lab.
,
Dec 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01 commit 00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01 Author: Emily Hanley <eyaich@google.com> Date: Fri Dec 02 16:37:55 2016 Removing android_s5_perf_cq bot from master.tryserver.chromium.perf BUG= chromium:665529 Change-Id: I19fd87702c839eeb9e16fe19be66566ae0ac0e21 Reviewed-on: https://chromium-review.googlesource.com/415521 Commit-Queue: Emily Hanley <eyaich@chromium.org> Reviewed-by: Mike Stipicevic <stip@chromium.org> [modify] https://crrev.com/00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01/masters/master.tryserver.chromium.perf/master.cfg [modify] https://crrev.com/00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01/masters/master.tryserver.chromium.perf/slaves.cfg [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/basic_perf_tryjob_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/basic_perf_tryjob_with_metric_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/basic_perf_tryjob_with_revisions_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/basic_recipe_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/perf_cq_no_benchmark_to_run_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/perf_cq_no_changes_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/perf_cq_run_benchmark_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/perf_tryjob_config_error_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect.expected/perf_tryjob_failed_test_android_s5_perf_cq.json [modify] https://crrev.com/00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01/scripts/slave/recipes/bisection/android_bisect.py [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/basic_perf_tryjob_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/basic_perf_tryjob_with_metric_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/basic_perf_tryjob_with_revisions_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/basic_recipe_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/perf_cq_no_benchmark_to_run_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/perf_cq_no_changes_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/perf_cq_run_benchmark_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/perf_tryjob_config_error_android_s5_perf_cq.json [delete] https://crrev.com/bfafef31482c11bfdf10fc56a00ce85623e020de/scripts/slave/recipes/bisection/android_bisect_staging.expected/perf_tryjob_failed_test_android_s5_perf_cq.json [modify] https://crrev.com/00e9a1c210ed9aa63f5c5ebbb6073f289a21ea01/scripts/slave/recipes/bisection/android_bisect_staging.py
,
Dec 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f9637e257b44580340ea9c0b8b96920c093db7c commit 6f9637e257b44580340ea9c0b8b96920c093db7c Author: eyaich <eyaich@chromium.org> Date: Tue Dec 06 14:14:45 2016 Removing android_s5_perf_cq bot from mb config map BUG= chromium:665529 Review-Url: https://codereview.chromium.org/2550713002 Cr-Commit-Position: refs/heads/master@{#436586} [modify] https://crrev.com/6f9637e257b44580340ea9c0b8b96920c093db7c/tools/mb/mb_config.pyl
,
Dec 12 2016
,
Dec 12 2016
This bot is officially removed from the configuration, so someone is free to take on the next steps that Annie outlined.
,
Feb 2 2017
,
Feb 2 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome-golo/chrome-golo/+/8043883b68b06bde93d81f864b708a4c00ad6e60 commit 8043883b68b06bde93d81f864b708a4c00ad6e60 Author: Peter Schmidt <pschmidt@google.com> Date: Thu Feb 02 18:54:39 2017 |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by nedngu...@google.com
, Nov 15 2016