Update average times for perf desktop benchmarks |
||||||||
Issue descriptionI just discovered this file today: https://cs.chromium.org/chromium/src/tools/perf/desktop_benchmark_avg_times.json?q=tools/perf/deskto+package:%5Echromium$&l=1 This file controls how we shard everything on the perf waterfall. It is actually out of date. It looks like about 75 tests have been added since, which has messed up the sharding. I discovered this because I got some data about how we shard the bots, and I saw a very obvious failure here: https://screenshot.googleplex.com/cEFvc5fnFnC We could easily move system_health.memory_desktop to a different bot and save a lot of time on the build. However, it looks like the sharding is actually pretty much messed up on most bots. I have the ability to get the new averages of times; I've attached the new json file with timings. The generate perf json script seems to error when I run it now, not sure what's up with it. I'm worried about changing the affinity of all these bots. It would mess up the data on the perf dashboard, correct?
,
Feb 1 2017
https://codereview.chromium.org/2666803004 regenerates the json.
,
Feb 1 2017
https://codereview.chromium.org/2667023003 is the CL for updating the new average times.
,
Feb 1 2017
Thanks so much for taking a look at the device affinity! Changing the affinity creates a one-time discontinuity in data each time we do it. It is definitely okay to do sometimes; we actually planned for once a quarter. Having a single discontinuity is unavoidable sometimes, and much less disruptive than just not having affinity at all. The big thing we would want to do is warn the perf sheriffs this will happen. If we know: * Which tests and bots will be affected * What the likely commit position is We can do the following: * Email both the mailing list and the sheriff likely to be on duty when the CL lands. * Put a warning in the CL description: "PERF SHERIFFS: This is likely to cause an expected change in the following tests and bots..." The big question I have is whether we'll be able to report an exact commit position for this. The recipe pulls in the version of testing/buildbot/chromium.perf.json at the CL being tested, right? So it should upload data to the perf dashboard with that commit position in the range, and then it would be really clear this CL could be the cause?
,
Feb 1 2017
Annie - is there some process we could add to our quarterly/yearly pageset refresh here?
,
Feb 1 2017
I think that we'll need a process for quarterly/yearly pageset refresh, but whether we also do a quarterly refresh of device affinity timings probably depends on our longer term plan for device affinity: we have a hand-wavey plan to do per-story sharding which is dependent on moving the benchmarks into a single step. Fleshing that out a bit more is required before we decide whether to roll it into the pageset refresh.
,
Feb 1 2017
re #4: This all sounds fine. The CL changing the bot affinity should show up in the commit position, and should be uploaded to the perf dashboard. So we should be good. I'll work on doing this today.
,
Feb 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/891d0d69870da1d4470170dde087afe354e01e15 commit 891d0d69870da1d4470170dde087afe354e01e15 Author: martiniss <martiniss@chromium.org> Date: Thu Feb 02 18:15:23 2017 Regenerate //testing/buildbot/chromium.perf.json PERF SHERIFFS: This CL should cause the 'v8.mobile_infinite_scroll-turbo_tbmv2' benchmark to start running on the bot. No other change is expected. BUG= 687425 Review-Url: https://codereview.chromium.org/2666803004 Cr-Commit-Position: refs/heads/master@{#447797} [modify] https://crrev.com/891d0d69870da1d4470170dde087afe354e01e15/testing/buildbot/chromium.perf.fyi.json [modify] https://crrev.com/891d0d69870da1d4470170dde087afe354e01e15/testing/buildbot/chromium.perf.json
,
Feb 2 2017
I'm gathering some data about how the affinity would change. Based on preliminary analysis, it looks like this is true: change percentage for Mac Mini 8GB 10.12 Perf : 78.06% change percentage for Win 7 ATI GPU Perf : 76.50% change percentage for Win 7 Perf : 76.88% change percentage for Win 7 Intel GPU Perf : 76.88% change percentage for Mac Air 10.11 Perf : 78.06% change percentage for Mac Retina Perf : 78.06% change percentage for Linux Perf : 76.50% change percentage for Win 8 Perf : 76.88% change percentage for Mac Pro 10.11 Perf : 78.06% change percentage for Win 10 Perf : 77.66% change percentage for Win 7 x64 Perf : 77.27% change percentage for Mac 10.11 Perf : 77.66% change percentage for Win 7 Nvidia GPU Perf : 76.50% change percentage for Mac 10.12 Perf : 78.06% change percentage for Win 10 High-DPI Perf : 80.61% change percentage for Win Zenbook Perf : 80.61% This percentage is supposed to represent how many tests are changing device affinity. I need to double check this number, but I believe it to be accurate. Overall, this is a pretty large amount of re-distribution. I want to try to do some simulations of how much time we're saving right now, but that might be hard to calculate.
,
Feb 2 2017
How important would it be to get that percentage change down? I could try to write some sort of sharding algorithm to redistribute the tests in a nice way such that not that many tests get changed.
,
Feb 2 2017
Annie, could you give some feedback on this? (It's fine if you haven't already, just assigning to be explicit about next steps) How ok are you with just proceeding, even it's a bit risky? I can gather some data about how our sharding is doing after it lands and the bots do one cycle of tests.
,
Feb 3 2017
Something else to think about; these timings include data about disabled tests (or can). Removing these tests would give us a ton of time back, and reduce cycle time a lot. If a test gets fixed, it would end up needing to be re-enabled, which would require a src side CL. I think that it would be reasonable to regenerate the src side json every time the benchmark disable/enable state changes, though. Can even enforce that in a presubmit check.
,
Feb 3 2017
I came up with another idea, which could help for now. There are only 4 builders which are really suffering right now, and have builds actually timing out. These builders are "Linux Perf", "Mac 10.12 Perf", "Win Zenbook Perf", and "Mac Mini 8GB 10.12 Perf". The first three can get capacity back by redistributing tests on them manually. The Mac Mini bot looks fairly hopeless in general, and re-balancing tests on it won't really help at the moment. The issue with it is bug 686974 . But, I believe the following changes would help with this: Linux Perf: move webrtc.peerconnection to run on build149-m1 Mac 10.12 Perf: move page_cycler_v2.intl_hi_ru to run on build162-m1 move blink_perf.parser to run on build162-m1 Win Zenbook Perf: move v8.todomvc to run on build31-m1 This is a pretty small list of tests to move, and it should give us some more breathing room. You can check this result yourself by looking at the attached screenshot. It shows you the distribution of tests on each of the builders. Some bots are clearly more over subscribed than others, and this change should balance out the load more. I think it'd be useful to both make a much more accurate and optimized sharding algorithm, and (possibly) not run disabled tests, but that takes some work, and this would be a quick CL to implement, and would give good immediate benefit. I'll work on writing a CL to do the above changes.
,
Feb 3 2017
I have a CL, but not uploading as it's messy. And, it looks like my math in #9 was wrong. I'll figure out what's wrong with it and try to get new percentages soon.
,
Feb 10 2017
I'm really sorry I was so slow to give feedback on this bug. But if the CL description has a PERF SHERIFFS note, I am okay with a very large shift in device affinity like this, especially if it only happens ~quarterly.
,
Feb 13 2017
,
Feb 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/549d1501a66ed2ec0283149f65c71d64569183aa commit 549d1501a66ed2ec0283149f65c71d64569183aa Author: martiniss <martiniss@chromium.org> Date: Mon Feb 13 23:56:46 2017 chromium.perf.json: Special case some desktop bots This should help alleviate capacity issues we've been having for a bit. Longer-term solutions should be coming for this, but these particular bots are a bit overloaded right now. PERF SHERIFFS: This CL will cause changes to the following benchmarks * page_cycler_v2.intl_hi_ru on Mac 10.12 Perf * blink_perf.parser on Mac 10.12 Perf It also affects two other benchmarks, but those are already not sending data, because of missing capacity, so those won't cause any jumps in timings. BUG= 687425 , 680360 , 685275 Review-Url: https://codereview.chromium.org/2692133002 Cr-Commit-Position: refs/heads/master@{#450159} [modify] https://crrev.com/549d1501a66ed2ec0283149f65c71d64569183aa/testing/buildbot/chromium.perf.json
,
Mar 14 2017
,
Dec 11 2017
I'm not working on this anymore. I believe emily's planned work will fix this bug.
,
Dec 12
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 13
This is fixed/obsolete with the new sharding algorithm and timing that was updated. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by martiniss@chromium.org
, Feb 1 2017