New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 595744 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Regression

Blocking:
issue 596067



Sign in to add a comment

There's no API to disable a benchmark on low-end Windows only

Project Member Reported by petrcermak@chromium.org, Mar 17 2016

Issue description

This is an issue when a benchmark is failing on low-end Windows perf bots and passing on all other Windows perf bots, like here: https://bugs.chromium.org/p/chromium/issues/detail?id=595737

sullivan: Could you please help me route this to the appropriate person?
 
Cc: eakuefner@chromium.org nednguyen@chromium.org
Ethan has done some work in this area recently. Ethan, any thoughts?
The only way to distinguish the failing bot in https://bugs.chromium.org/p/chromium/issues/detail?id=595737 that I can think of is to check the number of CPUs: it seems that the failing bot has only 2 CPUs while the others have more (including the other low-end bot).
We can add a field like: NumberOfCPUS() to platform object and do s.t similar to this: https://code.google.com/p/chromium/codesearch#chromium/src/tools/perf/benchmarks/blink_style.py&l=34

def ShouldDisabled(..)
   return possible_browser.platform.GetOSName() == 'Windows' and possible_browser.platform.GetNumCPUs() <= 2
We don't have a really good way of dealing with this, since as Petr notes there is not really anything useful to distinguish the Windows low-end perf configuration from others, besides the number of CPUs.

You could define a ShouldDisable that checks if the platform is Windows and the machine has only two cores, but that seems a bit scary and heavy-handed, since our machine configurations are constantly evolving.
If it's okay to go this route, though, Ned's solution sounds like what we want.

1. Add GetNumCPUs to DesktopPlatformBackend
2. Expose it through DesktopPlatform
3. Use it from ShouldDisable.
I agree with Ethan. The number of CPUs was only meant as a hack to solve https://bugs.chromium.org/p/chromium/issues/detail?id=595737.

I wonder if the bots could have some kind of a flag that says they're low-end? 

Right now when there's a failure on the low-end bot, there are only two options:
1. Live with a red bot until the failure is fixed.
2. Disable the failing benchmark on all Windows bots.
I don't think either of these options is acceptable.
I agree that both of these options are unacceptable.

This seems like the moral equivalent of `IsSvelte`, and I think we agree that it's worthwhile to be able to permanently specify that certain benchmarks should not run on sufficiently low-end configurations.

Is there an environment variable we can query to get the name of the bot? That would be slightly more robust, and slightly more transparent when it comes time once again to shift the hardware configurations around.
Cc: dtu@chromium.org
Ethan made a great point about "disabling the benchmark on windows machine with 2 cpus" is very superstitious. I think this also points out a huge flaw in our disabling test based on platform properties strategy. Maybe disabling tests due to firefighting reason should be done from bot recipes level? 

What if we have a dashboard with benchmarks on the rows & bots on columns for sheriffs to disable benchmark on specific bots? 

+Dave: how crazy do you think it would be to build s.t like this?

Actually, I recall that the discussion on extending the decorator functionality also landed on something like this: that actually there should be a dashboard from which sheriffs enable and disable benchmarks and input reasons. Something like sheriff-o-matic, but for benchmarks.
Re #8: We used to have enable/disable in the buildbot side (before recipes), and it was incredibly confusing for everyone:
* People would forget about disabled tests
* People would make changes to tests and not even know they were disabled

I don't think we should go in that direction, at least not in the short term.

Comment 11 by dtu@chromium.org, Mar 17 2016

Disabling from a dashboard is very doable, especially if the benchmarks are also defined by a dashboard. I have some hand-wavey ideas about what the bot sheriffing flow could look like.

I think it was very insightful when you said that sheriffs only need to disable benchmarks to turn the waterfall green. If we have a sheriffing UI separate from the waterfall that shows failures in a different way, we don't need to disable things, which is also advantageous because it's easier to debug a failure that's currently happening.
Owner: nednguyen@chromium.org
Status: Started (was: Untriaged)
Sorry for making this stagnant, I think we have a couple of good ideas here. The discussion of what is the right way for sheriffs to handle test failure should be moved to a different thread. 


For now let's do the pragmatic thing which is introducing the NumCPUs API to unblock  issue 595737  & issue  issue 596067  (both P1). 

Unfortunately, that won't help with  issue 596067  because the Zenbook is not the only bot with 4 CPUs (e.g. https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%208%20Perf%20%285%29/builds/3869/steps/bot_update/logs/stdio).

I don't think there's any point in implementing the CPU API. As a short-term solution, I can just import the multiprocessing module and read the number of CPUs from there. As we all agreed, number of CPUs is not a good long-term solution.
Actually introducing GetNumCPus() and make should it work on win/mac/linux/cros/cros-remote/android is asking for more trouble that it needs for this one-off use cases. I will just use the multiprocessing.cpu_count() directly since we only support windows as a host platform.

Disable CLs are:
https://codereview.chromium.org/1815243002/
https://codereview.chromium.org/1811353002/
As discussed with Ned, for the Zenbook, the following might work:

def ShouldDisable(self, possible_browser):
  return 'Zenbook' in os.getenv('BUILDBOT_BUILDERNAME', '')

but again, this is a hack.
Blockedon: 596113
Owner: nedngu...@google.com
The CLs are cq'ed but can't land due to  issue 596113  (CQ_EXTRA_TRYBOTS timed out)


It's not easy to address   issue 596113 , so I will land the disable patches without the CQ_EXTRA_TRYBOTS run
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/15047b716f232fc75525e4afdd81eb892da75640

commit 15047b716f232fc75525e4afdd81eb892da75640
Author: nednguyen <nednguyen@google.com>
Date: Fri Mar 18 20:19:19 2016

Disable page_cycler.basic_oopif on zenbook bot

BUG= 596067 ,  595744 

Review URL: https://codereview.chromium.org/1815243002

Cr-Commit-Position: refs/heads/master@{#382065}

[modify] https://crrev.com/15047b716f232fc75525e4afdd81eb892da75640/tools/perf/benchmarks/page_cycler.py

Blockedon: -596113
Seems like disabling using environment variable doesn't work:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20Zenbook%20Perf%20%284%29/builds/1150
Blocking: -595737
Status: Fixed (was: Started)
I will work on fixing  issue 596067  (zenbook one), hence probably no need to disable that benchmark on zenbook using environment variable.
Project Member

Comment 22 by bugdroid1@chromium.org, Mar 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/88167f9e85f7b897deab1ce4b4ff93146f03d32b

commit 88167f9e85f7b897deab1ce4b4ff93146f03d32b
Author: nednguyen <nednguyen@google.com>
Date: Sun Mar 20 20:24:57 2016

Revert of Disable page_cycler.basic_oopif on zenbook bot (patchset #3 id:40001 of https://codereview.chromium.org/1815243002/ )

Reason for revert:
Did not disable the benchmark:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20Zenbook%20Perf%20%284%29/builds/1150

Original issue's description:
> Disable page_cycler.basic_oopif on zenbook bot
>
> BUG= 596067 ,  595744 
>
> Committed: https://crrev.com/15047b716f232fc75525e4afdd81eb892da75640
> Cr-Commit-Position: refs/heads/master@{#382065}

TBR=petrcermak@chromium.org,eakuefner@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG= 596067 ,  595744 

Review URL: https://codereview.chromium.org/1819743002

Cr-Commit-Position: refs/heads/master@{#382229}

[modify] https://crrev.com/88167f9e85f7b897deab1ce4b4ff93146f03d32b/tools/perf/benchmarks/page_cycler.py

The disable didn't work, and was reverted. Is this still blocking 595737? If so, we should reopen.
As far as I understand, it's not blocking it because it was fixed by increasing timeout instead: https://codereview.chromium.org/1816873002/

Sign in to add a comment