New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 906654 link

Starred by 4 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----

Blocked on:
issue 865539



Sign in to add a comment

Infra failure in performance_browser_tests on Win 7 Nvidia Perf, Win 7 Perf: shard timeouts.

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Nov 19

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com

performance_browser_tests failing on multiple builders

Builders failed on: 
- Win 7 Nvidia GPU Perf: 
  https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Nvidia%20GPU%20Perf
- Win 7 Perf: 
  https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Perf

Can the trooper take a look?
 
These are infra failures because the builds are timing out. looking at a sample build, all of the performance_test_suite shards appear long (5+ hours), but some time out themselves (e.g. https://chrome-swarming.appspot.com/task?id=414974d263dd9e10&refresh=10&show_raw=1) despite having a 7 hour timeout for the test.

Looking at https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Perf?limit=200, build times appear to have grown gradually over October. Should we just raise the timeouts for the build & test tasks?
Cc: ushesh@chromium.org
Ushesh, do you know who owns perf waterfall runtimes? Should we just increase the timeout?
Not my area anymore, but my 2cent is we either should increase hardware, or telling team to remove tests. Increase the timeout will lead us to slower waterfall, which come with all sort of problems: harder to bisect, delay sheriffing's reaction,...
Cc: -nednguyen@chromium.org nedngu...@google.com
Is there anyone to investigate what has caused the runtime increase? I would not expect that to be a trooper task...
I am not sure whether the investigation of runtime increase would fall under CCA's or CCI's responsibility. Ushesh, any idea?
I also don't think this would be a CCI responsibility, fwiw.
How is the lack of testing capacity does not involve CCI? 
#8: sorry, I should have been more clear in #7. I don't think *investigating runtime increase* would be a CCI responsibility.
#9: that seems reasonable. Investigating runtime increase seems to fall into speed TPM's plate, imo.
Owner: ushesh@chromium.org
Ushesh, can you look into this?
Addressing some points raised above -

1. I don't know who specifically owns the perf waterfall runtimes. If I hadn't read through this bug, I would have assumed Ned but that doesn't seem to be the case any more. I feel like this would be whoever owns the perf bots and now this leads to my point 2.

2. To me, this is going back to the conversation we had about a month ago regarding the ownership of perf infrastructure in the long term and what the expectation is, currently, of CCI (which was determined to be fairly specific and narrow)

3. I can start looking into the runtime increase but don't really know enough about the infrastructure to go into super detail. I can do some basic analysis but will need to reach out to folks who have specific knowledge of these systems.
CCI doesn't canonically own investigation of functional test runtime regressions, either. (When we do investigate, we're biased toward turning things off.) I view this as more of a sheriff responsibility, in the same vein that a regression in the result of a test -- a functional test failing, or a performance test regressing.
The fail vs regression comparison makes sense to me. Since this is a longer term trend, I think it would still be me looking into it as opposed to a sheriff responsibility for the perf bot folks (probably will work with them though)
Cc: eyaich@chromium.org crouleau@chromium.org
I understand that we don't want to increase the timeout longterm, but that seems to be the only short term solution to this problem. Then ushesh@ will be able to investigate the runtime increase carefully and we can together come up with a thoughtful, systematic approach rather than being rushed by this timeout failure. In the meantime, let's keep the priority to P1 until we can figure out longterm solution.
Summary: Infra failure in performance_browser_tests on Win 7 Nvidia Perf, Win 7 Perf: shard timeouts. (was: Infra failure in performance_browser_tests on Win 7 Nvidia Perf, Win 7 Perf)
I spoke with Ned and I am no longer so sure. We can increase the per shard expiration here https://cs.chromium.org/chromium/src/tools/perf/core/perf_data_generator.py?rcl=c416f80a900acf43478a6ad469ed0fa4af2e35aa&l=973 but that will increase it for all platforms. Our target is to keep platforms below 1 hour cycle time, and the expiration is currently set to 2 hours. If we increase this to for example 3 hours, then will we notice if cycle time jumps up by 30 minutes? I don't like the hard failure to be our alert for this problem, but maybe that is needed until we can set up softer alerts. longterm, we will need someone (maybe ushesh@?) to monitor these metrics and do things like order new devices ahead of time.
I definitely think it is worth spending some time on monitoring the run time of these tests on the infrastructure over time and being aware of what the increases are and why.  I think that was the next step to make sure we maintained our run times and had a good algorithm for obtaining new hardware and determining if we can add more tests to the suite.

-1 to increasing the timeout before before we evaluate this.
Note that we have this script to fetch the benchmark & story runtime:
https://cs.chromium.org/chromium/src/tools/perf/core/retrieve_story_timing.py

We can easily add datetime as an extra field to the output
Right but keeping track of these and providing useful analysis to both alert and answer the basic questions of:

1) How many tests?  Can we add more?
2) How much infrastructure do we need?

Storing this data and the monitoring piece is the part we never finalized a design around.
If there isn't a tool that does this on the CCI side already, perf dashboard can store this data and alert on it.
While ushesh@ is working on the long term plan, I will make the bandaid fix to increase timeouts of shard & builds.
Project Member

Comment 23 by bugdroid1@chromium.org, Nov 29

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome/src-internal.git/+/e35b8ef61038d0ef826d0dfe626f41c583007a90

commit e35b8ef61038d0ef826d0dfe626f41c583007a90
Author: Nghia Nguyen <nednguyen@google.com>
Date: Thu Nov 29 15:30:31 2018

Project Member

Comment 25 by bugdroid1@chromium.org, Nov 29

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2738bf79ec9fe52db59c19e2891a9f49acb8b4d1

commit 2738bf79ec9fe52db59c19e2891a9f49acb8b4d1
Author: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com>
Date: Thu Nov 29 17:21:43 2018

Roll src-internal 73b3bde2659f..f38fffb7aa2f (2 commits)

https://chrome-internal.googlesource.com/chrome/src-internal.git/+log/73b3bde2659f..f38fffb7aa2f


Created with:
  gclient setdep -r src-internal@f38fffb7aa2f

The AutoRoll server is located here: https://autoroll-internal.skia.org/r/src-internal-chromium-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.



BUG=chromium:906654
TBR=mmoss@chromium.org

Change-Id: I781599f8fc3ee65a6151606e25b935421aaeb03f
Reviewed-on: https://chromium-review.googlesource.com/c/1355360
Reviewed-by: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com>
Commit-Queue: chromium-internal-autoroll <chromium-internal-autoroll@skia-corp.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#612247}
[modify] https://crrev.com/2738bf79ec9fe52db59c19e2891a9f49acb8b4d1/DEPS

*update: the suite is now working again.

https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Perf/3611

performance_test_suite on (102b) GPU on Windows on Windows-2008ServerR2-SP1 Run on OS: 'Windows-2008ServerR2-SP1'
Max pending time: 0:20:39.283899 (shard #0)
Max shard duration: 8:13:31.165188 (shard #3)
Min shard duration: 4:56:34.180917 (shard #0)
Total tests: 1228
* Passed: 1105 (1105 expected, 0 unexpected)
* Skipped: 121 (121 expected, 0 unexpected)
* Failed: 2 (0 expected, >>>2 unexpected<<<)
* Flaky: 0 (0 expected, 0 unexpected)
 
Unexpected Failures:
* rendering.desktop/css_animations_staggered_style_element
* system_health.common_desktop/multitab:misc:typical24:2018


performance_test_suite on NVIDIA GPU on Windows on Windows-2008ServerR2-SP1 Run on OS: 'Windows-2008ServerR2-SP1'
Max pending time: 0:14:12.713929 (shard #3)
Max shard duration: 8:09:39.391104 (shard #3)
Min shard duration: 4:45:37.913856 (shard #0)
Total tests: 1228
* Passed: 1175 (1175 expected, 0 unexpected)
* Skipped: 50 (50 expected, 0 unexpected)
* Failed: 3 (0 expected, >>>3 unexpected<<<)
* Flaky: 0 (0 expected, 0 unexpected)
 
Unexpected Failures:
* speedometer2-future/Speedometer2
* system_health.common_desktop/multitab:misc:typical24:2018
* tracing.tracing_with_background_memory_infra/http://www.amazon.com

(https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/Win%207%20Nvidia%20GPU%20Perf/3426)


The shards are badly unbalanced. I will send CL to reshard "Win 7 Nvidia Perf" & "Win 7 Perf"
Project Member

Comment 27 by bugdroid1@chromium.org, Dec 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ce92577b8b416e1a5566333a846488ce79dc0fd0

commit ce92577b8b416e1a5566333a846488ce79dc0fd0
Author: Ned Nguyen <nednguyen@google.com>
Date: Mon Dec 03 23:28:21 2018

Update the timing & shard maps for win 7 perf bots

** NOTE TO PERF SHERIFF **: this CL might cause false regressions on "Win 7 Nvidia GPU Perf" & "Win 7 Perf"
that are safe to ignore

Commandline:
$ ./tools/perf/generate_perf_sharding update --builders="Win 7 Nvidia GPU Perf" -r
$ ./tools/perf/generate_perf_sharding update --builders="Win 7 Perf" -r


Bug:906654
Change-Id: I420051f0629a8b8c79ac2e4da6cc736e911b02bb

NOTRY=true # ios-simulator flake

Change-Id: I420051f0629a8b8c79ac2e4da6cc736e911b02bb
Reviewed-on: https://chromium-review.googlesource.com/c/1358911
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Emily Hanley <eyaich@chromium.org>
Reviewed-by: Caleb Rouleau <crouleau@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613315}
[modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/timing_data/win_7_nvidia_gpu_perf_timing.json
[modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/timing_data/win_7_perf_timing.json
[modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/win_7_nvidia_gpu_perf_map.json
[modify] https://crrev.com/ce92577b8b416e1a5566333a846488ce79dc0fd0/tools/perf/core/shard_maps/win_7_perf_map.json

Components: Speed>Benchmarks>Waterfall
Labels: -Infra-Troopers
Status: Assigned (was: Available)
I did an analysis of where the runtime increase on Win 7 Perf came from between Oct 6, 2018 and Dec 6, 2018. I used this code to get the data: https://chromium-review.googlesource.com/c/chromium/src/+/1369051

My data is at https://docs.google.com/spreadsheets/d/1Vi37tr3kmhWQ07LYogpaAcvzxrOxoUuHNOplQEfFdFc/edit#gid=573964993

Based on my analysis, the four benchmarks:

v8.browsing_desktop
v8.browsing_desktop-future
system_health.memory_desktop
system_health.common_desktop

are responsible for 94.8% of the increase in runtime between those dates.

Each is responsible for ~40 minutes of increased runtime. 

I think that a reasonable plan for reducing this runtime back is to require the owners for each of the four benchmarks to either

1. Disable stories and optimize the benchmark until the runtime is back to Oct 6th levels

or

2. use src/tools/perf/expectations.config to disable stories just on 'Win 7 Perf' and 'Win 7 Nvidia Perf' until runtime is back to Oct 6th levels. One trick is that you could run half your tests on 'Win 7 Perf' and half on 'Win 7 Nvidia Perf' in order to continue to have Win 7 coverage for all stories. For example, it might be reasonable to completely disable v8.browsing_desktop on 'Win 7 Perf' and completely disable v8.browsing_desktop-future on 'Win 7 Nvidia Perf'.


Note that this work is required because we cannot obtain more devices of the 'Win 7 Perf' and 'Win 7 Nvidia Perf' models that we currently have on the waterfall. The devices that we have now are old and no longer supported. We need to get a new fleet of Windows 7 devices, but deciding what to buy needs to be done carefully, and it will take a long time for new orders to be shipped, so we haven't done it yet.

Ushesh, could you please work with stakeholders to see if my plan above (or some edited version of it) is reasonable. If it gets LGTM from Speed and Ops TLs, then we should move forward to contact the benchmark owners with the above request. I'm out on vacation next week.
Cc: u...@chromium.org cbruni@chromium.org
+cc cbruni@, ulan@: since the biggest increase come from the System health stories refresh


I think by then end of Q1 we can disable the old benchmarks since.
Ideally, we do need at least one full quarter overlap to properly track down regressions.

mlippautz@ recently updated vinn and got at least 20% reduction in metric calculation time.

Also I noticed that the number of added benchmarks is for the v8.browsing_desktop is quite a bit lower than expected.

v8.browsing_desktop	        2733	6123	3390	24	31	7
v8.browsing_desktop-future	2493	5386	2893	22	29	7


I talked to ulan@ and he suggested disabling all the old (non-2018) stories on v8.browsing_desktop-future and v8.browsing_mobile-future.
This should give us some leeway until we can fully disable the old stories on all bots.
Owner: cbruni@chromium.org
Camillo, could you please take action on your above suggestion?
Status: Started (was: Assigned)
Project Member

Comment 33 by bugdroid1@chromium.org, Dec 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6440952598da58b100bfd274207abd0af4d6e2d8

commit 6440952598da58b100bfd274207abd0af4d6e2d8
Author: Camillo Bruni <cbruni@chromium.org>
Date: Tue Dec 18 14:14:36 2018

[perf] Disable superseded v8.browsing_desktop-future benchmarks

Due to resource constraints on the perf-waterfall we disable all
benchmarks on v8.browsing_desktop-future and v8.browsing_mobile-future
which have been superseded by newly recorded 2018 versions.

These benchmarks test the --future configuration of V8 and thus do not
require that much immediate coverage.

Bug: 906654
Change-Id: Ibe8d716e721586221dcb28928c9264e95980c414
Reviewed-on: https://chromium-review.googlesource.com/c/1382472
Commit-Queue: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617478}
[modify] https://crrev.com/6440952598da58b100bfd274207abd0af4d6e2d8/tools/perf/expectations.config

Cc: mythria@chromium.org
Cc: -nedngu...@google.com
Blockedon: 865539
Cc: jmad...@chromium.org

Sign in to add a comment