Higher-is-better score vs lower-is-better time results on Perf Dashboard |
|||||||
Issue descriptionLink to example: https://chromeperf.appspot.com/report?sid=69c8dd1a68cae4fe359b701bb58b738e147e929d57d7609ffa103df858c469b8 I'm trying to output 'ns' based results (for nanoseconds) and they seem to be showing up as "score (higher is better). They should be lower-is-better and I would want to get alerts when the scores increase. Any advice on how to fix this?
,
Oct 31
I add "ns" (smaller is better) to the chromeperf dashboard's UnitsToDirection, but that will only help if the uploaded data specifies units: ns. I'm not sure that's the case since the chart says score. I found and ran angle_perftests in chromium/src, but I don't know how to convert it to the chartjson that is uploaded to the chromeperf dashboard. jmadill or nednguyen, do either of you know how angle_perftests' output is converted to chartjson?
,
Oct 31
I think it's through Emily's chartjson converter script
,
Oct 31
,
Nov 1
Maybe from this file? But not sure where the 'score' happens. https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py
,
Nov 1
The conversion is actually done on the swarming bot before the recipe merge step calls the src side upload script. https://cs.chromium.org/chromium/src/tools/perf/generate_legacy_perf_dashboard_json.py
,
Nov 1
Here's an example run: https://chrome-swarming.appspot.com/task?id=40e6137b61c9f010&refresh=10 Perf results are for example: *RESULT VulkanPipelineCachePerf: nanoSecPerIteration= 2343.2714013482 ns *RESULT VulkanPipelineCachePerf: score= 4268 score Emily do you know why this might be recording everything as 'score'? Maybe there's a collision in the units for the "subtest"?
,
Nov 2
The unit ("score" or "ns") is passed to printResult:
https://cs.chromium.org/chromium/src/third_party/angle/src/tests/perf_tests/ANGLEPerfTest.cpp?q=nanoSecPerIteration&sq=package:chromium&g=0&l=233
https://cs.chromium.org/chromium/src/third_party/angle/src/tests/perf_tests/ANGLEPerfTest.cpp?q=nanoSecPerIteration&sq=package:chromium&g=0&l=229
printResult prints lines like this:
*RESULT DrawCallPerf_vulkan_null: score= 0 score
Those result lines are transformed into chartjson by generate_legacy_perf_dashboard_json.py. That's a library module, not a script, so I had to go figure out which script to run that calls it:
./testing/scripts/run_gtest_perf_test.py --isolated-script-test-output=iso.out --isolated-script-test-perf-output=iso.perf.out ./out/Default/angle_perftests
I don't see any nanoSecPerIteration values. Do I need to do something special to get them?
Is there a way to pass gtest_filter? It took a long time to run.
,
Nov 2
Maybe try updating your tree and running gclient sync. The nanoSecPerIteration values should be apparent in top of tree. They were added maybe a week ago. Try using --gtest_filter=DrawCall*gl or DrawCall*gl_null
,
Nov 2
Thanks! This works:
./testing/scripts/run_gtest_perf_test.py --isolated-script-test-output=iso.out --isolated-script-test-perf-output=charts.json ./out/Default/angle_perftests --gtest_filter='DrawCall*'
Here's what generate_legacy_perf_dashboard_json writes to charts.json:
{
"DrawCallPerf_gl_render_to_texture_null": {
"traces": {
"score": [
"32502.0",
"0.0"
],
"nanoSecPerIteration": [
"307.673437562",
"0.0"
]
},
"units": "score",
"important": [
"nanoSecPerIteration",
"score"
]
},
"DrawCallPerf_vulkan_tex_change": {
"traces": {
"score": [
"21.0",
"0.0"
],
"nanoSecPerIteration": [
"476218.899286",
"0.0"
]
},
"units": "score",
"important": [
"nanoSecPerIteration",
"score"
]
},
...
That json is saying there's a measurement "DrawCallPerf_gl_render_to_texture_null" with units "score", and that measurement is recorded by two test cases (fka "traces" or "stories") named "score" and "nanoSecPerIteration". That's not what you want it to say.
You want charts.json to say that there's one measurement "score" with units "score" that is recorded in many test cases such as "DrawCallPerf_gl_render_to_texture_null", and there's another measurement "nanoSecPerIteration" with units "ns" that is recorded in many test cases such as "DrawCallPerf_gl_render_to_texture_null".
The json to say that would look like this:
{"score": {"units": "score", "traces": {"DrawCallPerf_gl_render_to_texture_null": [42], ...}}, "nanoSecPerIteration": {"units": "ns", "traces": {"DrawCallPerf_gl_render_to_texture_null": [42], ...}}}
Changing the structure will cause the test paths on the dashboard to change from master/bot/angle_perftests/DrawCallPerf_gl_render_to_texture_null/nanoSecPerIteration to master/bot/angle_perftests/nanoSecPerIteration/DrawCallPerf_gl_render_to_texture_null. If necessary, please file a Speed>Dashboard bug to migrate the existing data from the old test paths to the new test paths.
If generate_legacy_perf_dashboard_json is going to be changed to do that, it might be a good time to switch to Histograms. We have a nice library so you don't have to think about the json structure:
https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram_set.py
Here's a doc to get started:
https://docs.google.com/document/d/1vWs1KOLDfv9YC8M_xfmHViwDxDC6wEWKjkQK1-nJmNs/edit
In HistogramSet, you'd add a Histogram named "score" with units=unitlessNumber and a Diagnostic named "stories" containing strings such as "DrawCallPerf_gl_render_to_texture_null". When those Histograms are uploaded to the chromeperf dashboard, it would add the data to a timeseries whose test path looks like master/bot/angle_perftests/score/DrawCallPerf_gl_render_to_texture_null. If necessary, please file a Speed>Dashboard bug to migrate the existing data from the old test paths to the new test paths.
I don't have time to rewrite generate_perf_dashboard_json for you, but I'd be happy to VC, guide, and review CLs.
,
Nov 2
,
Nov 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3ca3fb27df6b1e3b2a7bb200e50e7361395995af commit 3ca3fb27df6b1e3b2a7bb200e50e7361395995af Author: Jamie Madill <jmadill@chromium.org> Date: Fri Nov 16 19:10:58 2018 Add multi results in legacy perf dashboard script. Currently when the script reads multiple results it outputs the last value read as well as a zero placehold for standard deviation. According to benjhayden@ we can return all results in an array. This change does just that and adds a test. It also updates the expected unit test output. Bug: 900677 Change-Id: Icc38808ea83caddbdf6b8582e2ddc34db4f8cdf0 Reviewed-on: https://chromium-review.googlesource.com/c/1338468 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#608882} [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/generate_legacy_perf_dashboard_json.py [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/artificial_graph-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/commit_charge-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/graphing_processor.log [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/processes-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/vm_final_browser-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/vm_final_total-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/ws_final_browser-summary.dat [modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/ws_final_total-summary.dat
,
Nov 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/671499f4d257cab081e4342fa1eae17e930b0ac2 commit 671499f4d257cab081e4342fa1eae17e930b0ac2 Author: Jamie Madill <jmadill@chromium.org> Date: Sat Nov 17 02:27:31 2018 Revert "Add multi results in legacy perf dashboard script." This reverts commit 3ca3fb27df6b1e3b2a7bb200e50e7361395995af. Reason for revert: Seems to be the incorrect way to handle multi-results. Instead of returning an array of values, we should compute the average and return a standard deviation as the last result. https://chromeperf.appspot.com/report?sid=3b11bf2aa2cf2281472588fed4dfc1e0a9f514ff10bd9db09cb208c5306ac3c9 Original change's description: > Add multi results in legacy perf dashboard script. > > Currently when the script reads multiple results it outputs the > last value read as well as a zero placehold for standard deviation. > > According to benjhayden@ we can return all results in an array. > This change does just that and adds a test. It also updates the > expected unit test output. > > Bug: 900677 > Change-Id: Icc38808ea83caddbdf6b8582e2ddc34db4f8cdf0 > Reviewed-on: https://chromium-review.googlesource.com/c/1338468 > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Commit-Queue: Jamie Madill <jmadill@chromium.org> > Cr-Commit-Position: refs/heads/master@{#608882} TBR=jmadill@chromium.org,benjhayden@chromium.org,nednguyen@google.com,eyaich@chromium.org Change-Id: I24fe0fae6607bf4d1bf8fb55d2c4f78d18a74b71 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 900677 Reviewed-on: https://chromium-review.googlesource.com/c/1341087 Reviewed-by: Jamie Madill <jmadill@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#609076} [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/generate_legacy_perf_dashboard_json.py [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/artificial_graph-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/commit_charge-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/graphing_processor.log [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/processes-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/vm_final_browser-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/vm_final_total-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/ws_final_browser-summary.dat [modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/ws_final_total-summary.dat
,
Nov 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0 commit ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0 Author: Jamie Madill <jmadill@chromium.org> Date: Mon Nov 19 17:26:34 2018 Re-land: Add multi results in legacy perf dashboard script. Currently when the script reads multiple results it overrites the current result with last value read. It does not compute a mean or standard deviation. This change updates the script to keep a list of values. It then computes a mean and standard deviation from the list. The first attempt at fixing this returned a list of values. This was incorrect. Also includes a comment an assert to prevent this. Bug: 900677 Change-Id: Ided17ea36478128e003e6d9317ca5fb15128415a Reviewed-on: https://chromium-review.googlesource.com/c/1341211 Commit-Queue: Jamie Madill <jmadill@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#609342} [modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/generate_legacy_perf_dashboard_json.py [modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/testdata/artificial_graph-summary.dat [modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/testdata/graphing_processor.log
,
Nov 26
The ANGLE dashboards are now working properly with lower-is-better: https://chromeperf.appspot.com/report?sid=ca17599fd002887de77c5d653606f8ca2eeb0df9163effd474ae501e6271a991 They also have the correct statistics calculations to smooth out variance from the CL in comment #14. I think I'm going to close this out without doing the other changes I had planned. The script is a bit complicated and would cause additional work to migrate the dashboards and notify the various benchmark owners.
,
Nov 26
jmadill@: do you mind filing bugs for the clean up work so we can backlog it?
,
Jan 7
,
Jan 7
ned: filed issue 919503 . I noticed the new dashboards were picked up what seemed to be the correct test/measurement so I didn't file a bug about inverting the data structure. Seems pretty low priority as the old dashboards will be deprecated. Or it could be folded into issue 919503 as well if I'm mistaken. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jmad...@chromium.org
, Oct 31