Posted bisect output does not match raw build data |
|||
Issue descriptionOn crbug.com/763955#3 the bisect posted === BISECT JOB RESULTS === Perf regression found but unable to narrow commit range Build failures prevented the bisect from narrowing the range further. Bisect Details Configuration: android_webview_arm64_aosp_perf_bisect Benchmark : system_health.webview_startup_multiprocess Metric : webview_startup_wall_time_avg/blank_about/blank_about_blank Change : 5.17% | 156.028071429 -> 162.956571429 Suspected Commit Range 4 commits in range https://chromium.googlesource.com/chromium/src/+log/a606d6e0020c013eee1862afcd4018338588abf7..ed9ec796e62619560c0a8ca788f3ee1dc49e7c07 Revision Result N chromium@471824 156.028 +- 20.865 14 good chromium@471891 156.495 +- 19.1133 14 good chromium@471892 --- --- build failure chromium@471893 --- --- build failure chromium@471894 --- --- build failure chromium@471895 162.24 +- 29.1168 14 bad chromium@471899 167.241 +- 7.60149 6 bad chromium@471909 162.427 +- 9.67142 9 bad chromium@471924 162.622 +- 23.0431 14 bad chromium@471956 162.957 +- 22.8165 14 bad To Run This Test src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=blank.about.blank system_health.webview_startup_multiprocess More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8968733382489024640 ========================== However the corresponding build at: https://luci-milo.appspot.com/buildbot/tryserver.chromium.perf/android_webview_arm64_aosp_perf_bisect/1677 - does not appear to have any failed builds, - has data for revisions not listed above (e.g. chromium@471907) Any idea what happened?
,
Sep 25 2017
Hmm, yes. That was super confusing. :-/ What does "inconclusive" mean in this context? Not sure if there are some easy fixes to apply and at least make the language a bit clearer. Gladly none of this will be an issue in the pinpoint world.
,
Sep 25 2017
Poking through compare_samples a bit more. This seems like a corner case, and possibly a bug looking at the logic. Inconclusive in this context means, given the max samples, it couldn't categorize it either as good or bad. Compare_samples returns significant/insignificant/ or needs_more_data with the expectation that you'll run the revisions until you have the max # of samples for each. That only ever happens for the first 2 reference values, and afterwards only the revision under test is re-run to gather more samples. So that would imply that it should have bailed out earlier with an inconclusive result exception, rather than continue on. Yeah this will be a lot simpler in Pinpoint, given that you'd be able to see the problem quickly.
,
Sep 25 2017
Hmm that's not quite right, it's fine if it's only the revision under test that's rerun, there's a check for that specifically. I can try to clear up the message a bit. Perhaps something like: === BISECT JOB RESULTS === Perf regression found but unable to narrow commit range One or more revisions couldn't be conclusively categorized as good or bad, preventing the bisect from narrowing the range further. Bisect Details Configuration: android_webview_arm64_aosp_perf_bisect Benchmark : system_health.webview_startup_multiprocess Metric : webview_startup_wall_time_avg/blank_about/blank_about_blank Change : 5.17% | 156.028071429 -> 162.956571429 Suspected Commit Range 4 commits in range https://chromium.googlesource.com/chromium/src/+log/a606d6e0020c013eee1862afcd4018338588abf7..ed9ec796e62619560c0a8ca788f3ee1dc49e7c07 Revision Result N chromium@471824 156.028 +- 20.865 14 good chromium@471891 156.495 +- 19.1133 14 good chromium@471892 --- --- test couldn't determine good or bad chromium@471893 --- --- test couldn't determine good or bad chromium@471894 --- --- test couldn't determine good or bad chromium@471895 162.24 +- 29.1168 14 bad chromium@471899 167.241 +- 7.60149 6 bad chromium@471909 162.427 +- 9.67142 9 bad chromium@471924 162.622 +- 23.0431 14 bad chromium@471956 162.957 +- 22.8165 14 bad
,
Sep 26 2017
Even if the test "couldn't determine good or bad", maybe it should be fine to post the result values? e.g. maybe something like: Revision Result N chromium@471824 156.028 +- 20.865 14 good chromium@471891 156.495 +- 19.1133 14 good chromium@471892 159.123 +- 60.1234 14 undecided chromium@471893 159.123 +- 60.1234 14 undecided chromium@471894 159.123 +- 60.1234 14 undecided chromium@471895 162.24 +- 29.1168 14 bad chromium@471899 167.241 +- 7.60149 6 bad chromium@471909 162.427 +- 9.67142 9 bad chromium@471924 162.622 +- 23.0431 14 bad chromium@471956 162.957 +- 22.8165 14 bad
,
Oct 10 2017
,
Oct 4
|
|||
►
Sign in to add a comment |
|||
Comment 1 by simonhatch@chromium.org
, Sep 25 2017