New issue
Advanced search Search tips

Issue 768384 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Oct 4
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 769710



Sign in to add a comment

Posted bisect output does not match raw build data

Project Member Reported by perezju@chromium.org, Sep 25 2017

Issue description

On crbug.com/763955#3 the bisect posted

=== BISECT JOB RESULTS ===
Perf regression found but unable to narrow commit range

Build failures prevented the bisect from narrowing the range further.


Bisect Details
  Configuration: android_webview_arm64_aosp_perf_bisect
  Benchmark    : system_health.webview_startup_multiprocess
  Metric       : webview_startup_wall_time_avg/blank_about/blank_about_blank
  Change       : 5.17% | 156.028071429 -> 162.956571429

Suspected Commit Range
  4 commits in range
  https://chromium.googlesource.com/chromium/src/+log/a606d6e0020c013eee1862afcd4018338588abf7..ed9ec796e62619560c0a8ca788f3ee1dc49e7c07


Revision             Result                  N
chromium@471824      156.028 +- 20.865       14       good
chromium@471891      156.495 +- 19.1133      14       good
chromium@471892      ---                     ---      build failure
chromium@471893      ---                     ---      build failure
chromium@471894      ---                     ---      build failure
chromium@471895      162.24 +- 29.1168       14       bad
chromium@471899      167.241 +- 7.60149      6        bad
chromium@471909      162.427 +- 9.67142      9        bad
chromium@471924      162.622 +- 23.0431      14       bad
chromium@471956      162.957 +- 22.8165      14       bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=blank.about.blank system_health.webview_startup_multiprocess

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8968733382489024640
==========================

However the corresponding build at:
https://luci-milo.appspot.com/buildbot/tryserver.chromium.perf/android_webview_arm64_aosp_perf_bisect/1677

- does not appear to have any failed builds,
- has data for revisions not listed above (e.g. chromium@471907)

Any idea what happened?
 
Yeah this output isn't quite right. What's actually happening is:

- does not appear to have any failed builds,

These actually are inconclusive, rather than failed builds. Think the bisect output formatter is getting tripped up, I can fix that.

- has data for revisions not listed above (e.g. chromium@471907)

The json output has data for every revision in the range, it just might not be displayed in the output if it's not relevant (ie. an untested revision, a failed build that didn't impede the bisect, etc.). In this case, there were tested revisions like chromium@471907 but they were inconclusive results, but since chromium@471909 and chromium@471899 were conclusive (and marked as bad), the output didn't include them.
Hmm, yes. That was super confusing. :-/

What does "inconclusive" mean in this context? Not sure if there are some easy fixes to apply and at least make the language a bit clearer.

Gladly none of this will be an issue in the pinpoint world.
Poking through compare_samples a bit more. This seems like a corner case, and possibly a bug looking at the logic. Inconclusive in this context means, given the max samples, it couldn't categorize it either as good or bad.

Compare_samples returns significant/insignificant/ or needs_more_data with the expectation that you'll run the revisions until you have the max # of samples for each. That only ever happens for the first 2 reference values, and afterwards only the revision under test is re-run to gather more samples.

So that would imply that it should have bailed out earlier with an inconclusive result exception, rather than continue on.

Yeah this will be a lot simpler in Pinpoint, given that you'd be able to see the problem quickly.
Hmm that's not quite right, it's fine if it's only the revision under test that's rerun, there's a check for that specifically.

I can try to clear up the message a bit. Perhaps something like:


=== BISECT JOB RESULTS ===
Perf regression found but unable to narrow commit range

One or more revisions couldn't be conclusively categorized as good or bad, preventing the bisect from narrowing the range further.


Bisect Details
  Configuration: android_webview_arm64_aosp_perf_bisect
  Benchmark    : system_health.webview_startup_multiprocess
  Metric       : webview_startup_wall_time_avg/blank_about/blank_about_blank
  Change       : 5.17% | 156.028071429 -> 162.956571429

Suspected Commit Range
  4 commits in range
  https://chromium.googlesource.com/chromium/src/+log/a606d6e0020c013eee1862afcd4018338588abf7..ed9ec796e62619560c0a8ca788f3ee1dc49e7c07


Revision             Result                  N
chromium@471824      156.028 +- 20.865       14       good
chromium@471891      156.495 +- 19.1133      14       good
chromium@471892      ---                     ---      test couldn't determine good or bad
chromium@471893      ---                     ---      test couldn't determine good or bad
chromium@471894      ---                     ---      test couldn't determine good or bad
chromium@471895      162.24 +- 29.1168       14       bad
chromium@471899      167.241 +- 7.60149      6        bad
chromium@471909      162.427 +- 9.67142      9        bad
chromium@471924      162.622 +- 23.0431      14       bad
chromium@471956      162.957 +- 22.8165      14       bad


Even if the test "couldn't determine good or bad", maybe it should be fine to post the result values? e.g. maybe something like:

Revision             Result                  N
chromium@471824      156.028 +- 20.865       14       good
chromium@471891      156.495 +- 19.1133      14       good
chromium@471892      159.123 +- 60.1234      14       undecided
chromium@471893      159.123 +- 60.1234      14       undecided
chromium@471894      159.123 +- 60.1234      14       undecided
chromium@471895      162.24 +- 29.1168       14       bad
chromium@471899      167.241 +- 7.60149      6        bad
chromium@471909      162.427 +- 9.67142      9        bad
chromium@471924      162.622 +- 23.0431      14       bad
chromium@471956      162.957 +- 22.8165      14       bad

Blocking: 769710
Status: Archived (was: Assigned)

Sign in to add a comment