Infra failure in performance_browser_tests on Win 7 Nvidia Perf, Win 7 Perf: shard timeouts. |
||||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com performance_browser_tests failing on multiple builders Builders failed on: - Win 7 Nvidia GPU Perf: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Nvidia%20GPU%20Perf - Win 7 Perf: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Perf Can the trooper take a look?
,
Nov 20
Ushesh, do you know who owns perf waterfall runtimes? Should we just increase the timeout?
,
Nov 20
Not my area anymore, but my 2cent is we either should increase hardware, or telling team to remove tests. Increase the timeout will lead us to slower waterfall, which come with all sort of problems: harder to bisect, delay sheriffing's reaction,...
,
Nov 20
,
Nov 20
Is there anyone to investigate what has caused the runtime increase? I would not expect that to be a trooper task...
,
Nov 20
I am not sure whether the investigation of runtime increase would fall under CCA's or CCI's responsibility. Ushesh, any idea?
,
Nov 21
I also don't think this would be a CCI responsibility, fwiw.
,
Nov 21
How is the lack of testing capacity does not involve CCI?
,
Nov 21
#8: sorry, I should have been more clear in #7. I don't think *investigating runtime increase* would be a CCI responsibility.
,
Nov 21
#9: that seems reasonable. Investigating runtime increase seems to fall into speed TPM's plate, imo.
,
Nov 21
Ushesh, can you look into this?
,
Nov 21
Addressing some points raised above - 1. I don't know who specifically owns the perf waterfall runtimes. If I hadn't read through this bug, I would have assumed Ned but that doesn't seem to be the case any more. I feel like this would be whoever owns the perf bots and now this leads to my point 2. 2. To me, this is going back to the conversation we had about a month ago regarding the ownership of perf infrastructure in the long term and what the expectation is, currently, of CCI (which was determined to be fairly specific and narrow) 3. I can start looking into the runtime increase but don't really know enough about the infrastructure to go into super detail. I can do some basic analysis but will need to reach out to folks who have specific knowledge of these systems.
,
Nov 21
CCI doesn't canonically own investigation of functional test runtime regressions, either. (When we do investigate, we're biased toward turning things off.) I view this as more of a sheriff responsibility, in the same vein that a regression in the result of a test -- a functional test failing, or a performance test regressing.
,
Nov 21
The fail vs regression comparison makes sense to me. Since this is a longer term trend, I think it would still be me looking into it as opposed to a sheriff responsibility for the perf bot folks (probably will work with them though)
,
Nov 21
,
Nov 21
I understand that we don't want to increase the timeout longterm, but that seems to be the only short term solution to this problem. Then ushesh@ will be able to investigate the runtime increase carefully and we can together come up with a thoughtful, systematic approach rather than being rushed by this timeout failure. In the meantime, let's keep the priority to P1 until we can figure out longterm solution.
,
Nov 21
I spoke with Ned and I am no longer so sure. We can increase the per shard expiration here https://cs.chromium.org/chromium/src/tools/perf/core/perf_data_generator.py?rcl=c416f80a900acf43478a6ad469ed0fa4af2e35aa&l=973 but that will increase it for all platforms. Our target is to keep platforms below 1 hour cycle time, and the expiration is currently set to 2 hours. If we increase this to for example 3 hours, then will we notice if cycle time jumps up by 30 minutes? I don't like the hard failure to be our alert for this problem, but maybe that is needed until we can set up softer alerts. longterm, we will need someone (maybe ushesh@?) to monitor these metrics and do things like order new devices ahead of time.
,
Nov 26
I definitely think it is worth spending some time on monitoring the run time of these tests on the infrastructure over time and being aware of what the increases are and why. I think that was the next step to make sure we maintained our run times and had a good algorithm for obtaining new hardware and determining if we can add more tests to the suite. -1 to increasing the timeout before before we evaluate this.
,
Nov 26
Note that we have this script to fetch the benchmark & story runtime: https://cs.chromium.org/chromium/src/tools/perf/core/retrieve_story_timing.py We can easily add datetime as an extra field to the output
,
Nov 26
Right but keeping track of these and providing useful analysis to both alert and answer the basic questions of: 1) How many tests? Can we add more? 2) How much infrastructure do we need? Storing this data and the monitoring piece is the part we never finalized a design around.
,
Nov 26
If there isn't a tool that does this on the CCI side already, perf dashboard can store this data and alert on it.
,
Nov 29
While ushesh@ is working on the long term plan, I will make the bandaid fix to increase timeouts of shard & builds.
,
Nov 29
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome/src-internal.git/+/e35b8ef61038d0ef826d0dfe626f41c583007a90 commit e35b8ef61038d0ef826d0dfe626f41c583007a90 Author: Nghia Nguyen <nednguyen@google.com> Date: Thu Nov 29 15:30:31 2018
,
Nov 29
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181 commit 6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181 Author: Ned Nguyen <nednguyen@google.com> Date: Thu Nov 29 17:09:24 2018 Increase shard timeout limit for performance_test_suite to 10 hours Bug: 906654 Change-Id: I08817df22c5043ec92ae51b7deeae15ceb1877ee Reviewed-on: https://chromium-review.googlesource.com/c/1354690 Reviewed-by: Emily Hanley <eyaich@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#612239} [modify] https://crrev.com/6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181/testing/buildbot/chromium.perf.fyi.json [modify] https://crrev.com/6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181/testing/buildbot/chromium.perf.json [modify] https://crrev.com/6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181/tools/perf/core/perf_data_generator.py [modify] https://crrev.com/6f7afe50480ccab2f1eaf7bb56c56f59a2a1c181/tools/perf/core/perf_data_generator_unittest.py
,
Nov 29
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2738bf79ec9fe52db59c19e2891a9f49acb8b4d1 commit 2738bf79ec9fe52db59c19e2891a9f49acb8b4d1 Author: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com> Date: Thu Nov 29 17:21:43 2018 Roll src-internal 73b3bde2659f..f38fffb7aa2f (2 commits) https://chrome-internal.googlesource.com/chrome/src-internal.git/+log/73b3bde2659f..f38fffb7aa2f Created with: gclient setdep -r src-internal@f38fffb7aa2f The AutoRoll server is located here: https://autoroll-internal.skia.org/r/src-internal-chromium-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. BUG=chromium:906654 TBR=mmoss@chromium.org Change-Id: I781599f8fc3ee65a6151606e25b935421aaeb03f Reviewed-on: https://chromium-review.googlesource.com/c/1355360 Reviewed-by: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com> Commit-Queue: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#612247} [modify] https://crrev.com/2738bf79ec9fe52db59c19e2891a9f49acb8b4d1/DEPS
,
Dec 3
*update: the suite is now working again. https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Perf/3611 performance_test_suite on (102b) GPU on Windows on Windows-2008ServerR2-SP1 Run on OS: 'Windows-2008ServerR2-SP1' Max pending time: 0:20:39.283899 (shard #0) Max shard duration: 8:13:31.165188 (shard #3) Min shard duration: 4:56:34.180917 (shard #0) Total tests: 1228 * Passed: 1105 (1105 expected, 0 unexpected) * Skipped: 121 (121 expected, 0 unexpected) * Failed: 2 (0 expected, >>>2 unexpected<<<) * Flaky: 0 (0 expected, 0 unexpected) Unexpected Failures: * rendering.desktop/css_animations_staggered_style_element * system_health.common_desktop/multitab:misc:typical24:2018 performance_test_suite on NVIDIA GPU on Windows on Windows-2008ServerR2-SP1 Run on OS: 'Windows-2008ServerR2-SP1' Max pending time: 0:14:12.713929 (shard #3) Max shard duration: 8:09:39.391104 (shard #3) Min shard duration: 4:45:37.913856 (shard #0) Total tests: 1228 * Passed: 1175 (1175 expected, 0 unexpected) * Skipped: 50 (50 expected, 0 unexpected) * Failed: 3 (0 expected, >>>3 unexpected<<<) * Flaky: 0 (0 expected, 0 unexpected) Unexpected Failures: * speedometer2-future/Speedometer2 * system_health.common_desktop/multitab:misc:typical24:2018 * tracing.tracing_with_background_memory_infra/http://www.amazon.com (https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Nvidia%20GPU%20Perf/3426) The shards are badly unbalanced. I will send CL to reshard "Win 7 Nvidia Perf" & "Win 7 Perf"
,
Dec 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ce92577b8b416e1a5566333a846488ce79dc0fd0 commit ce92577b8b416e1a5566333a846488ce79dc0fd0 Author: Ned Nguyen <nednguyen@google.com> Date: Mon Dec 03 23:28:21 2018 Update the timing & shard maps for win 7 perf bots ** NOTE TO PERF SHERIFF **: this CL might cause false regressions on "Win 7 Nvidia GPU Perf" & "Win 7 Perf" that are safe to ignore Commandline: $ ./tools/perf/generate_perf_sharding update --builders="Win 7 Nvidia GPU Perf" -r $ ./tools/perf/generate_perf_sharding update --builders="Win 7 Perf" -r Bug:906654 Change-Id: I420051f0629a8b8c79ac2e4da6cc736e911b02bb NOTRY=true # ios-simulator flake Change-Id: I420051f0629a8b8c79ac2e4da6cc736e911b02bb Reviewed-on: https://chromium-review.googlesource.com/c/1358911 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Emily Hanley <eyaich@chromium.org> Reviewed-by: Caleb Rouleau <crouleau@chromium.org> Cr-Commit-Position: refs/heads/master@{#613315} [modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/timing_data/win_7_nvidia_gpu_perf_timing.json [modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/timing_data/win_7_perf_timing.json [modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/win_7_nvidia_gpu_perf_map.json [modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/win_7_perf_map.json
,
Dec 8
I did an analysis of where the runtime increase on Win 7 Perf came from between Oct 6, 2018 and Dec 6, 2018. I used this code to get the data: https://chromium-review.googlesource.com/c/chromium/src/+/1369051 My data is at https://docs.google.com/spreadsheets/d/1Vi37tr3kmhWQ07LYogpaAcvzxrOxoUuHNOplQEfFdFc/edit#gid=573964993 Based on my analysis, the four benchmarks: v8.browsing_desktop v8.browsing_desktop-future system_health.memory_desktop system_health.common_desktop are responsible for 94.8% of the increase in runtime between those dates. Each is responsible for ~40 minutes of increased runtime. I think that a reasonable plan for reducing this runtime back is to require the owners for each of the four benchmarks to either 1. Disable stories and optimize the benchmark until the runtime is back to Oct 6th levels or 2. use src/tools/perf/expectations.config to disable stories just on 'Win 7 Perf' and 'Win 7 Nvidia Perf' until runtime is back to Oct 6th levels. One trick is that you could run half your tests on 'Win 7 Perf' and half on 'Win 7 Nvidia Perf' in order to continue to have Win 7 coverage for all stories. For example, it might be reasonable to completely disable v8.browsing_desktop on 'Win 7 Perf' and completely disable v8.browsing_desktop-future on 'Win 7 Nvidia Perf'. Note that this work is required because we cannot obtain more devices of the 'Win 7 Perf' and 'Win 7 Nvidia Perf' models that we currently have on the waterfall. The devices that we have now are old and no longer supported. We need to get a new fleet of Windows 7 devices, but deciding what to buy needs to be done carefully, and it will take a long time for new orders to be shipped, so we haven't done it yet. Ushesh, could you please work with stakeholders to see if my plan above (or some edited version of it) is reasonable. If it gets LGTM from Speed and Ops TLs, then we should move forward to contact the benchmark owners with the above request. I'm out on vacation next week.
,
Dec 8
+cc cbruni@, ulan@: since the biggest increase come from the System health stories refresh
,
Dec 12
I think by then end of Q1 we can disable the old benchmarks since. Ideally, we do need at least one full quarter overlap to properly track down regressions. mlippautz@ recently updated vinn and got at least 20% reduction in metric calculation time. Also I noticed that the number of added benchmarks is for the v8.browsing_desktop is quite a bit lower than expected. v8.browsing_desktop 2733 6123 3390 24 31 7 v8.browsing_desktop-future 2493 5386 2893 22 29 7 I talked to ulan@ and he suggested disabling all the old (non-2018) stories on v8.browsing_desktop-future and v8.browsing_mobile-future. This should give us some leeway until we can fully disable the old stories on all bots.
,
Dec 17
Camillo, could you please take action on your above suggestion?
,
Dec 18
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6440952598da58b100bfd274207abd0af4d6e2d8 commit 6440952598da58b100bfd274207abd0af4d6e2d8 Author: Camillo Bruni <cbruni@chromium.org> Date: Tue Dec 18 14:14:36 2018 [perf] Disable superseded v8.browsing_desktop-future benchmarks Due to resource constraints on the perf-waterfall we disable all benchmarks on v8.browsing_desktop-future and v8.browsing_mobile-future which have been superseded by newly recorded 2018 versions. These benchmarks test the --future configuration of V8 and thus do not require that much immediate coverage. Bug: 906654 Change-Id: Ibe8d716e721586221dcb28928c9264e95980c414 Reviewed-on: https://chromium-review.googlesource.com/c/1382472 Commit-Queue: Camillo Bruni <cbruni@chromium.org> Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org> Cr-Commit-Position: refs/heads/master@{#617478} [modify] https://crrev.com/6440952598da58b100bfd274207abd0af4d6e2d8/tools/perf/expectations.config
,
Jan 2
,
Jan 2
,
Jan 2
,
Jan 11
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by jbudorick@chromium.org
, Nov 20