New issue
Advanced search Search tips

Issue 900677 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug

Blocking:
issue 919503
issue angleproject:2923



Sign in to add a comment

Higher-is-better score vs lower-is-better time results on Perf Dashboard

Project Member Reported by jmad...@chromium.org, Oct 31

Issue description

Link to example:

https://chromeperf.appspot.com/report?sid=69c8dd1a68cae4fe359b701bb58b738e147e929d57d7609ffa103df858c469b8

I'm trying to output 'ns' based results (for nanoseconds) and they seem to be showing up as "score (higher is better). They should be lower-is-better and I would want to get alerts when the scores increase.

Any advice on how to fix this?
 
Cc: benjhayden@chromium.org
CC'ing Ben who may know the dashboard well.
Cc: nednguyen@chromium.org
Owner: jmad...@chromium.org
I add "ns" (smaller is better) to the chromeperf dashboard's UnitsToDirection, but that will only help if the uploaded data specifies units: ns. I'm not sure that's the case since the chart says score.
I found and ran angle_perftests in chromium/src, but I don't know how to convert it to the chartjson that is uploaded to the chromeperf dashboard.
jmadill or nednguyen, do either of you know how angle_perftests' output is converted to chartjson?
Cc: eyaich@chromium.org
I think it's through Emily's chartjson converter script
Cc: -nednguyen@chromium.org nedngu...@google.com
Maybe from this file? But not sure where the 'score' happens.

https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py
The conversion is actually done on the swarming bot before the recipe merge step calls the src side upload script.  

https://cs.chromium.org/chromium/src/tools/perf/generate_legacy_perf_dashboard_json.py
Here's an example run:

https://chrome-swarming.appspot.com/task?id=40e6137b61c9f010&refresh=10

Perf results are for example:

*RESULT VulkanPipelineCachePerf: nanoSecPerIteration= 2343.2714013482 ns
*RESULT VulkanPipelineCachePerf: score= 4268 score

Emily do you know why this might be recording everything as 'score'? Maybe there's a collision in the units for the "subtest"?
The unit ("score" or "ns") is passed to printResult:
https://cs.chromium.org/chromium/src/third_party/angle/src/tests/perf_tests/ANGLEPerfTest.cpp?q=nanoSecPerIteration&sq=package:chromium&g=0&l=233
https://cs.chromium.org/chromium/src/third_party/angle/src/tests/perf_tests/ANGLEPerfTest.cpp?q=nanoSecPerIteration&sq=package:chromium&g=0&l=229

printResult prints lines like this:
*RESULT DrawCallPerf_vulkan_null: score= 0 score

Those result lines are transformed into chartjson by generate_legacy_perf_dashboard_json.py. That's a library module, not a script, so I had to go figure out which script to run that calls it:

./testing/scripts/run_gtest_perf_test.py --isolated-script-test-output=iso.out --isolated-script-test-perf-output=iso.perf.out ./out/Default/angle_perftests

I don't see any nanoSecPerIteration values. Do I need to do something special to get them?

Is there a way to pass gtest_filter? It took a long time to run.


Maybe try updating your tree and running gclient sync. The nanoSecPerIteration values should be apparent in top of tree. They were added maybe a week ago.

Try using --gtest_filter=DrawCall*gl or DrawCall*gl_null
Thanks! This works:

./testing/scripts/run_gtest_perf_test.py --isolated-script-test-output=iso.out --isolated-script-test-perf-output=charts.json ./out/Default/angle_perftests --gtest_filter='DrawCall*'

Here's what generate_legacy_perf_dashboard_json writes to charts.json:
{
  "DrawCallPerf_gl_render_to_texture_null": {
    "traces": {
      "score": [
        "32502.0",
        "0.0"
      ],
      "nanoSecPerIteration": [
        "307.673437562",
        "0.0"
      ]
    },
    "units": "score",
    "important": [
      "nanoSecPerIteration",
      "score"
    ]
  },
  "DrawCallPerf_vulkan_tex_change": {
    "traces": {
      "score": [
        "21.0",
        "0.0"
      ],
      "nanoSecPerIteration": [
        "476218.899286",
        "0.0"
      ]
    },
    "units": "score",
    "important": [
      "nanoSecPerIteration",
      "score"
    ]
  },
...

That json is saying there's a measurement "DrawCallPerf_gl_render_to_texture_null" with units "score", and that measurement is recorded by two test cases (fka "traces" or "stories") named "score" and "nanoSecPerIteration". That's not what you want it to say.
You want charts.json to say that there's one measurement "score" with units "score" that is recorded in many test cases such as "DrawCallPerf_gl_render_to_texture_null", and there's another measurement "nanoSecPerIteration" with units "ns" that is recorded in many test cases such as "DrawCallPerf_gl_render_to_texture_null".
The json to say that would look like this:
{"score": {"units": "score", "traces": {"DrawCallPerf_gl_render_to_texture_null": [42], ...}}, "nanoSecPerIteration": {"units": "ns", "traces": {"DrawCallPerf_gl_render_to_texture_null": [42], ...}}}

Changing the structure will cause the test paths on the dashboard to change from master/bot/angle_perftests/DrawCallPerf_gl_render_to_texture_null/nanoSecPerIteration to master/bot/angle_perftests/nanoSecPerIteration/DrawCallPerf_gl_render_to_texture_null. If necessary, please file a Speed>Dashboard bug to migrate the existing data from the old test paths to the new test paths.

If generate_legacy_perf_dashboard_json is going to be changed to do that, it might be a good time to switch to Histograms. We have a nice library so you don't have to think about the json structure:
https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram_set.py
Here's a doc to get started:
https://docs.google.com/document/d/1vWs1KOLDfv9YC8M_xfmHViwDxDC6wEWKjkQK1-nJmNs/edit

In HistogramSet, you'd add a Histogram named "score" with units=unitlessNumber and a Diagnostic named "stories" containing strings such as "DrawCallPerf_gl_render_to_texture_null". When those Histograms are uploaded to the chromeperf dashboard, it would add the data to a timeseries whose test path looks like master/bot/angle_perftests/score/DrawCallPerf_gl_render_to_texture_null. If necessary, please file a Speed>Dashboard bug to migrate the existing data from the old test paths to the new test paths.

I don't have time to rewrite generate_perf_dashboard_json for you, but I'd be happy to VC, guide, and review CLs.
Status: Assigned (was: Untriaged)
Project Member

Comment 12 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3ca3fb27df6b1e3b2a7bb200e50e7361395995af

commit 3ca3fb27df6b1e3b2a7bb200e50e7361395995af
Author: Jamie Madill <jmadill@chromium.org>
Date: Fri Nov 16 19:10:58 2018

Add multi results in legacy perf dashboard script.

Currently when the script reads multiple results it outputs the
last value read as well as a zero placehold for standard deviation.

According to benjhayden@ we can return all results in an array.
This change does just that and adds a test. It also updates the
expected unit test output.

Bug:  900677 
Change-Id: Icc38808ea83caddbdf6b8582e2ddc34db4f8cdf0
Reviewed-on: https://chromium-review.googlesource.com/c/1338468
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608882}
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/generate_legacy_perf_dashboard_json.py
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/artificial_graph-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/commit_charge-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/graphing_processor.log
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/processes-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/vm_final_browser-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/vm_final_total-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/ws_final_browser-summary.dat
[modify] https://crrev.com/3ca3fb27df6b1e3b2a7bb200e50e7361395995af/tools/perf/testdata/ws_final_total-summary.dat

Project Member

Comment 13 by bugdroid1@chromium.org, Nov 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/671499f4d257cab081e4342fa1eae17e930b0ac2

commit 671499f4d257cab081e4342fa1eae17e930b0ac2
Author: Jamie Madill <jmadill@chromium.org>
Date: Sat Nov 17 02:27:31 2018

Revert "Add multi results in legacy perf dashboard script."

This reverts commit 3ca3fb27df6b1e3b2a7bb200e50e7361395995af.

Reason for revert:
Seems to be the incorrect way to handle multi-results. Instead of
returning an array of values, we should compute the average and return
a standard deviation as the last result.

https://chromeperf.appspot.com/report?sid=3b11bf2aa2cf2281472588fed4dfc1e0a9f514ff10bd9db09cb208c5306ac3c9

Original change's description:
> Add multi results in legacy perf dashboard script.
> 
> Currently when the script reads multiple results it outputs the
> last value read as well as a zero placehold for standard deviation.
> 
> According to benjhayden@ we can return all results in an array.
> This change does just that and adds a test. It also updates the
> expected unit test output.
> 
> Bug:  900677 
> Change-Id: Icc38808ea83caddbdf6b8582e2ddc34db4f8cdf0
> Reviewed-on: https://chromium-review.googlesource.com/c/1338468
> Reviewed-by: Ned Nguyen <nednguyen@google.com>
> Commit-Queue: Jamie Madill <jmadill@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#608882}

TBR=jmadill@chromium.org,benjhayden@chromium.org,nednguyen@google.com,eyaich@chromium.org

Change-Id: I24fe0fae6607bf4d1bf8fb55d2c4f78d18a74b71
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  900677 
Reviewed-on: https://chromium-review.googlesource.com/c/1341087
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Cr-Commit-Position: refs/heads/master@{#609076}
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/generate_legacy_perf_dashboard_json.py
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/artificial_graph-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/commit_charge-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/graphing_processor.log
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/processes-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/vm_final_browser-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/vm_final_total-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/ws_final_browser-summary.dat
[modify] https://crrev.com/671499f4d257cab081e4342fa1eae17e930b0ac2/tools/perf/testdata/ws_final_total-summary.dat

Project Member

Comment 14 by bugdroid1@chromium.org, Nov 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0

commit ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0
Author: Jamie Madill <jmadill@chromium.org>
Date: Mon Nov 19 17:26:34 2018

Re-land: Add multi results in legacy perf dashboard script.

Currently when the script reads multiple results it overrites the
current result with last value read. It does not compute a mean
or standard deviation.

This change updates the script to keep a list of values. It then
computes a mean and standard deviation from the list.

The first attempt at fixing this returned a list of values. This
was incorrect. Also includes a comment an assert to prevent this.

Bug:  900677 
Change-Id: Ided17ea36478128e003e6d9317ca5fb15128415a
Reviewed-on: https://chromium-review.googlesource.com/c/1341211
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#609342}
[modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/generate_legacy_perf_dashboard_json.py
[modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/testdata/artificial_graph-summary.dat
[modify] https://crrev.com/ce4bc3cb864694fe5ed543ec4047d7c639d4e9b0/tools/perf/testdata/graphing_processor.log

Status: Fixed (was: Assigned)
The ANGLE dashboards are now working properly with lower-is-better:

https://chromeperf.appspot.com/report?sid=ca17599fd002887de77c5d653606f8ca2eeb0df9163effd474ae501e6271a991

They also have the correct statistics calculations to smooth out variance from the CL in comment #14. I think I'm going to close this out without doing the other changes I had planned. The script is a bit complicated and would cause additional work to migrate the dashboards and notify the various benchmark owners.
jmadill@: do you mind filing bugs for the clean up work so we can backlog it?
Blocking: 919503
ned: filed issue 919503 . I noticed the new dashboards were picked up what seemed to be the correct test/measurement so I didn't file a bug about inverting the data structure. Seems pretty low priority as the old dashboards will be deprecated. Or it could be folded into issue 919503 as well if I'm mistaken.

Sign in to add a comment