New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 680138 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 689674
issue 695960
issue 696014
issue 707236


Participants' hotlists:
speed-ops-high-priority


Sign in to add a comment

reference build failures on the chromium.perf waterfall should not turn the waterfall red

Project Member Reported by eyaich@chromium.org, Jan 11 2017

Issue description

When the reference build of a benchmark fails the chromium.perf waterfall should not turn red.  

mikecase@ has already implemented this for android (although we will lose this work when we swarm android).  

He has a CL out: https://chromium-review.googlesource.com/c/422435/  but there were discussions on design of this.  

We have a few options, or more that haven't been considered: 

1) Go with something like this CL where we special case it for perf
2) Update telemetry to return success on reference build failures

Lets have some discussion on this bug about the best way to handle this so we can resolve this in the near future.




 
Is this bug actively owned? If so, can you update the status? If not, can we get an owner? This is affecting bot health sheriffs, who should not be seeing red reference builds while sheriffing.
Owner: martiniss@chromium.org
Status: Assigned (was: Untriaged)
Sorry, this is semi owned. I've been meaning to do this, but it's non trivial unfortunately, given how the desktop recipes work.

I don't think case has worked on this. I'll assign it to me, and take a look today.
Labels: -Performance-BotHealth Performance-Sheriff-BotHealth
Blocking: 689674
Ping.
Cc: dpranke@chromium.org
Ok, I've given this some thought.

There are a two solutions to this that I can think of:

One is to signal to the chromium recipe somehow that we don't want it to fail the build. Almost like the "can_fail_build" option we had a long time ago. 

There exists an "ok_ret" argument in recipes already, which could give us this flexibility. We could add something like that to the isolated scripts spec sort of thing, and then that would work. Although ideally we'd set the status to warning, because it isn't good that the reference build is failing, but it's not build failing.
 
The other is to signal to the catapult test runner that we don't want to fail the test when the test is a reference build. I don't like this, because it doesn't really make sense when running it locally myself (like when trying to reproduce).

Dirk, what do you think about option one? I feel like it'd be a lot of plumbing :/
The "ok_ret" option show a warning seems great to me. 

I don't support signal the catapult test runner (you meant benchmark runner?) to ignore reference build, because the logic of ignoring the failure of reference build come from the sheriffing workflow. Telemetry as a benchmarking framework should not have opinions about such thing.
I also agree that option one sounds more like the way to go. 

I don't quite think I understand what sort of reference build failures you're hitting and what the effect on the build is. Is this maybe the sort of thing where you could catch StepFailure in your recipe and handle the exceptions you care about, rather than have to plumb things through?
We're using the chromium recipe, and plan to in the future. We have a android/perf recipe, but we're actively trying to get rid of it.

That's why I said we need to add plumbing; we'll need to plumb something through the chromium recipe, AFAIK.
How do you distinguish between running a reference build and a regular build in the recipe now? Can you point me to something in the json files?
https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.perf.json#21665 is how we distinguish between reference and non-reference builds. Basically, a name change and an argument change.
Cc: phajdan.jr@chromium.org
Okay. Given that (which is roughly what I expected), I'm not sure I see an alternative other than doing something like adding a "can_fail_build" or "ok_ret" field to the JSON and plumbing support through.

I don't think that should be that much work, though.

Paweł, what do you think?
Blocking: 696014
Blocking: 695960
Does it make sense to have SoM ignore the reference failures as well?
Hey Stephen - this is listed as high priority - what's the plan?
Cc: -nednguyen@chromium.org
Owner: nedngu...@google.com
I can take this bug.

Plan:
1) Add "can_fail_build" bit to JSON file used by SwarmingIsolatedScriptTest
2) Tweaking tools/perf/generate_perf_data so that it adds "can_fail_build" bit to all the Telemetry reference build.
Project Member

Comment 18 by bugdroid1@chromium.org, Mar 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/4daef50df40266df7d2dd58cea5b0c97b5ff9f63

commit 4daef50df40266df7d2dd58cea5b0c97b5ff9f63
Author: nednguyen <nednguyen@google.com>
Date: Wed Mar 29 09:26:09 2017

Add ignore_swarming_task_failure attribute for isolated_script

Rationale: we want to be able to ignore swarming task failure for
Chrome reference benchmark run (see attached bug).

This is implemented through passing ok_ret='any' to collect_isolated_script_task.py step
in swarming/api.py. 
This is to make sure that when there are unexpected infra failures 
in other part of the recipe , we don't ignore the failure.

BUG= chromium:680138 

Change-Id: Iff553c71853bec54b1350008fc2b26de7de34ab7
Reviewed-on: https://chromium-review.googlesource.com/458261
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Ned Nguyen <nednguyen@google.com>

[add] https://crrev.com/4daef50df40266df7d2dd58cea5b0c97b5ff9f63/scripts/slave/recipes/chromium.expected/dynamic_swarmed_isolated_script_perf_test_ignore_failure.json
[modify] https://crrev.com/4daef50df40266df7d2dd58cea5b0c97b5ff9f63/scripts/slave/recipes/chromium.py
[modify] https://crrev.com/4daef50df40266df7d2dd58cea5b0c97b5ff9f63/scripts/slave/recipe_modules/swarming/api.py
[modify] https://crrev.com/4daef50df40266df7d2dd58cea5b0c97b5ff9f63/scripts/slave/recipe_modules/chromium_tests/steps.py

Project Member

Comment 19 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f

commit 4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f
Author: nednguyen <nednguyen@google.com>
Date: Fri Mar 31 00:03:52 2017

Make sure that step presentation is not set to FAILURE if exit_code is zero

In  crbug.com/680138 , we introduce 'ignore_swarming_task_failure' which allows
swarming isolated script tests to express that it's ok for it to fail.

This is a follow up after https://chromium-review.googlesource.com/458261 to
ensure that if the test's exit_code is zero, step presentation will not be set
to FAILURE even if there are unexpected failures.


BUG= 680138 

Change-Id: Id44491241918be6bd0c8acd85e9f9df995a96614
Reviewed-on: https://chromium-review.googlesource.com/462407
Reviewed-by: Tim 'mithro' Ansell <tansell@chromium.org>
Commit-Queue: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_swarmed_isolated_script_perf_test_ignore_failure.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_local_isolated_script_test_with_custom_results_handler.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_swarmed_sharded_failed_isolated_script_test.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipe_modules/test_utils/test_api.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipe_modules/chromium_tests/steps.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_local_isolated_script_test_with_failed_json_results.json

Project Member

Comment 20 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f

commit 4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f
Author: nednguyen <nednguyen@google.com>
Date: Fri Mar 31 00:03:52 2017

Make sure that step presentation is not set to FAILURE if exit_code is zero

In  crbug.com/680138 , we introduce 'ignore_swarming_task_failure' which allows
swarming isolated script tests to express that it's ok for it to fail.

This is a follow up after https://chromium-review.googlesource.com/458261 to
ensure that if the test's exit_code is zero, step presentation will not be set
to FAILURE even if there are unexpected failures.


BUG= 680138 

Change-Id: Id44491241918be6bd0c8acd85e9f9df995a96614
Reviewed-on: https://chromium-review.googlesource.com/462407
Reviewed-by: Tim 'mithro' Ansell <tansell@chromium.org>
Commit-Queue: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_swarmed_isolated_script_perf_test_ignore_failure.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_local_isolated_script_test_with_custom_results_handler.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_swarmed_sharded_failed_isolated_script_test.json
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipe_modules/test_utils/test_api.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipe_modules/chromium_tests/steps.py
[modify] https://crrev.com/4e41fb0cdd4bda4b5a6dd9167dde76f0f63ceb0f/scripts/slave/recipes/chromium.expected/dynamic_local_isolated_script_test_with_failed_json_results.json

Project Member

Comment 21 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9f01b1a8c8342b49b40ca21ebeb111465793c03f

commit 9f01b1a8c8342b49b40ca21ebeb111465793c03f
Author: Nghia Nguyen <NghiaNguyenBH@gmail.com>
Date: Fri Mar 31 09:38:37 2017

Set ignore_swarming_task_failure for all reference build task

The effect of this is all the reference builds's non-infra 
failures will show up of as warnings (yellow color). This
also makes sheriff-o-matic ignore the reference build failure.

BUG= chromium:680138 

Change-Id: Ia92132da707fcc667517163a2b4c6f82f5eb5a90
Reviewed-on: https://chromium-review.googlesource.com/462275
Reviewed-by: Juan Antonio Navarro Pérez <perezju@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#461089}
[modify] https://crrev.com/9f01b1a8c8342b49b40ca21ebeb111465793c03f/testing/buildbot/chromium.perf.fyi.json
[modify] https://crrev.com/9f01b1a8c8342b49b40ca21ebeb111465793c03f/testing/buildbot/chromium.perf.json
[modify] https://crrev.com/9f01b1a8c8342b49b40ca21ebeb111465793c03f/tools/perf/core/perf_data_generator.py
[modify] https://crrev.com/9f01b1a8c8342b49b40ca21ebeb111465793c03f/tools/perf/core/perf_data_generator_unittest.py

Blocking: 707236
Status: Fixed (was: Assigned)
The memory.desktop.reference benchmark is failing with a yellow code (warning) in https://build.chromium.org/p/chromium.perf/builders/Win%2010%20High-DPI%20Perf/builds/449

Also memory.desktop.reference on Win 10 High DPU doesn't show up on https://sheriff-o-matic.appspot.com/chromium.perf, so mark this as fixed.

File  issue 707236  about how to not waste too much bot time on reference build that just fails completely.
Cc: ashleymarie@chromium.org
Status: Started (was: Fixed)
After the work to convert telemetry benchmark to produce json test results ( issue 717394 ), reference build failures are showing up again.

https://build.chromium.org/p/chromium.perf/builders/Win%207%20Nvidia%20GPU%20Perf/builds/1113

Apparently my implementation was very specific to the simple json test format, I will investigate what to do with json test format :P

Actually blink_perf.events.reference failure is still showing yellow in https://build.chromium.org/p/chromium.perf/builders/Win%207%20Nvidia%20GPU%20Perf/builds/1113
 (whereas blink_perf.bindings.reference failure is showing red). So this is definitely something else.
Status: Fixed (was: Started)
Since this is a one off issue with blink_perf.events.reference, I file another bug instead:  issue 754514 .


Sign in to add a comment