Issue metadata
Sign in to add a comment
|
There's no API to disable a benchmark on low-end Windows only |
||||||||||||||||||||||
Issue descriptionThis is an issue when a benchmark is failing on low-end Windows perf bots and passing on all other Windows perf bots, like here: https://bugs.chromium.org/p/chromium/issues/detail?id=595737 sullivan: Could you please help me route this to the appropriate person?
,
Mar 17 2016
The only way to distinguish the failing bot in https://bugs.chromium.org/p/chromium/issues/detail?id=595737 that I can think of is to check the number of CPUs: it seems that the failing bot has only 2 CPUs while the others have more (including the other low-end bot).
,
Mar 17 2016
We can add a field like: NumberOfCPUS() to platform object and do s.t similar to this: https://code.google.com/p/chromium/codesearch#chromium/src/tools/perf/benchmarks/blink_style.py&l=34 def ShouldDisabled(..) return possible_browser.platform.GetOSName() == 'Windows' and possible_browser.platform.GetNumCPUs() <= 2
,
Mar 17 2016
We don't have a really good way of dealing with this, since as Petr notes there is not really anything useful to distinguish the Windows low-end perf configuration from others, besides the number of CPUs. You could define a ShouldDisable that checks if the platform is Windows and the machine has only two cores, but that seems a bit scary and heavy-handed, since our machine configurations are constantly evolving.
,
Mar 17 2016
If it's okay to go this route, though, Ned's solution sounds like what we want. 1. Add GetNumCPUs to DesktopPlatformBackend 2. Expose it through DesktopPlatform 3. Use it from ShouldDisable.
,
Mar 17 2016
I agree with Ethan. The number of CPUs was only meant as a hack to solve https://bugs.chromium.org/p/chromium/issues/detail?id=595737. I wonder if the bots could have some kind of a flag that says they're low-end? Right now when there's a failure on the low-end bot, there are only two options: 1. Live with a red bot until the failure is fixed. 2. Disable the failing benchmark on all Windows bots. I don't think either of these options is acceptable.
,
Mar 17 2016
I agree that both of these options are unacceptable. This seems like the moral equivalent of `IsSvelte`, and I think we agree that it's worthwhile to be able to permanently specify that certain benchmarks should not run on sufficiently low-end configurations. Is there an environment variable we can query to get the name of the bot? That would be slightly more robust, and slightly more transparent when it comes time once again to shift the hardware configurations around.
,
Mar 17 2016
Ethan made a great point about "disabling the benchmark on windows machine with 2 cpus" is very superstitious. I think this also points out a huge flaw in our disabling test based on platform properties strategy. Maybe disabling tests due to firefighting reason should be done from bot recipes level? What if we have a dashboard with benchmarks on the rows & bots on columns for sheriffs to disable benchmark on specific bots? +Dave: how crazy do you think it would be to build s.t like this?
,
Mar 17 2016
Actually, I recall that the discussion on extending the decorator functionality also landed on something like this: that actually there should be a dashboard from which sheriffs enable and disable benchmarks and input reasons. Something like sheriff-o-matic, but for benchmarks.
,
Mar 17 2016
Re #8: We used to have enable/disable in the buildbot side (before recipes), and it was incredibly confusing for everyone: * People would forget about disabled tests * People would make changes to tests and not even know they were disabled I don't think we should go in that direction, at least not in the short term.
,
Mar 17 2016
Disabling from a dashboard is very doable, especially if the benchmarks are also defined by a dashboard. I have some hand-wavey ideas about what the bot sheriffing flow could look like. I think it was very insightful when you said that sheriffs only need to disable benchmarks to turn the waterfall green. If we have a sheriffing UI separate from the waterfall that shows failures in a different way, we don't need to disable things, which is also advantageous because it's easier to debug a failure that's currently happening.
,
Mar 18 2016
Sorry for making this stagnant, I think we have a couple of good ideas here. The discussion of what is the right way for sheriffs to handle test failure should be moved to a different thread. For now let's do the pragmatic thing which is introducing the NumCPUs API to unblock issue 595737 & issue issue 596067 (both P1).
,
Mar 18 2016
Unfortunately, that won't help with issue 596067 because the Zenbook is not the only bot with 4 CPUs (e.g. https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%208%20Perf%20%285%29/builds/3869/steps/bot_update/logs/stdio). I don't think there's any point in implementing the CPU API. As a short-term solution, I can just import the multiprocessing module and read the number of CPUs from there. As we all agreed, number of CPUs is not a good long-term solution.
,
Mar 18 2016
Actually introducing GetNumCPus() and make should it work on win/mac/linux/cros/cros-remote/android is asking for more trouble that it needs for this one-off use cases. I will just use the multiprocessing.cpu_count() directly since we only support windows as a host platform. Disable CLs are: https://codereview.chromium.org/1815243002/ https://codereview.chromium.org/1811353002/
,
Mar 18 2016
As discussed with Ned, for the Zenbook, the following might work:
def ShouldDisable(self, possible_browser):
return 'Zenbook' in os.getenv('BUILDBOT_BUILDERNAME', '')
but again, this is a hack.
,
Mar 18 2016
The CLs are cq'ed but can't land due to issue 596113 (CQ_EXTRA_TRYBOTS timed out)
,
Mar 18 2016
It's not easy to address issue 596113 , so I will land the disable patches without the CQ_EXTRA_TRYBOTS run
,
Mar 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/15047b716f232fc75525e4afdd81eb892da75640 commit 15047b716f232fc75525e4afdd81eb892da75640 Author: nednguyen <nednguyen@google.com> Date: Fri Mar 18 20:19:19 2016 Disable page_cycler.basic_oopif on zenbook bot BUG= 596067 , 595744 Review URL: https://codereview.chromium.org/1815243002 Cr-Commit-Position: refs/heads/master@{#382065} [modify] https://crrev.com/15047b716f232fc75525e4afdd81eb892da75640/tools/perf/benchmarks/page_cycler.py
,
Mar 20 2016
,
Mar 20 2016
Seems like disabling using environment variable doesn't work: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20Zenbook%20Perf%20%284%29/builds/1150
,
Mar 20 2016
I will work on fixing issue 596067 (zenbook one), hence probably no need to disable that benchmark on zenbook using environment variable.
,
Mar 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/88167f9e85f7b897deab1ce4b4ff93146f03d32b commit 88167f9e85f7b897deab1ce4b4ff93146f03d32b Author: nednguyen <nednguyen@google.com> Date: Sun Mar 20 20:24:57 2016 Revert of Disable page_cycler.basic_oopif on zenbook bot (patchset #3 id:40001 of https://codereview.chromium.org/1815243002/ ) Reason for revert: Did not disable the benchmark: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20Zenbook%20Perf%20%284%29/builds/1150 Original issue's description: > Disable page_cycler.basic_oopif on zenbook bot > > BUG= 596067 , 595744 > > Committed: https://crrev.com/15047b716f232fc75525e4afdd81eb892da75640 > Cr-Commit-Position: refs/heads/master@{#382065} TBR=petrcermak@chromium.org,eakuefner@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG= 596067 , 595744 Review URL: https://codereview.chromium.org/1819743002 Cr-Commit-Position: refs/heads/master@{#382229} [modify] https://crrev.com/88167f9e85f7b897deab1ce4b4ff93146f03d32b/tools/perf/benchmarks/page_cycler.py
,
Mar 21 2016
The disable didn't work, and was reverted. Is this still blocking 595737? If so, we should reopen.
,
Mar 22 2016
As far as I understand, it's not blocking it because it was fixed by increasing timeout instead: https://codereview.chromium.org/1816873002/ |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by sullivan@chromium.org
, Mar 17 2016