New issue
Advanced search Search tips

Issue 921342 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression

Blocked on:
issue 922041

Blocking:
issue 920325
issue 921693
issue 923564



Sign in to add a comment

angle_perftests (and other gtest perf tests) steps on perf waterfall uploading incorrect measurements

Project Member Reported by jmad...@chromium.org, Jan 12

Issue description

This test suite was accidentally broken around Jan 7. It's being restored in  issue 921039  . It looks like in the meantime a change has regressed the uploaded values. You can see by looking at the graph here:

https://chromeperf.appspot.com/report?sid=d92cb2d6ce4eb5a1edb54aa473a5d27902cd316dcaa4c86ec20df53c14be22f9&start_rev=612117&end_rev=622295

Values in the range of tens of nanoseconds are being reported in the millions. It seems like a script that parses the results is getting incorrect values. It is also apparent in the 'outdir_json' steps:

before:

https://logs.chromium.org/logs/chrome/buildbucket/cr-buildbucket.appspot.com/8925032059613019136/+/steps/angle_perftests_on_NVIDIA_GPU_on_Windows_on_Windows-2008ServerR2-SP1/0/logs/outdir_json/0

"DrawCallPerf_gl_null_vbo_change\": {\"traces\": {\"wall_time\": [\"343.932724596\", \"0.0104743098631\"]}, \"units\": \"ns\", \"important\"

after:

https://logs.chromium.org/logs/chrome/buildbucket/cr-buildbucket.appspot.com/8924553411965986368/+/steps/angle_perftests_on_NVIDIA_GPU_on_Windows_on_Windows-2008ServerR2-SP1/0/logs/outdir_json/0

{\"sampleValues\": [1355142850.82], \"name\": \"DrawCallPerf_d3d11_vbo_change\", \"running\": [2, 1355142850.82, 0, 1355142850.82, 1355142850.82, 

bsheedy, could this be related to your change "Support gtest perf test uploading on LUCI"? https://crrev.com/c/1371008 

Also seems related to issue 919503 .
 
Yes, it's caused by my patch. It's multiplying by a million because the histograms data format only supports milliseconds for time measurements, so the conversion script converts from nanoseconds to milliseconds.

How were these bots uploading results successfully before my patch? AFAIK there shouldn't have been any way for gtest data to be uploaded to the dashboard from LUCI since there wasn't any conversion from the gtest perf format to histograms. Or is the /add_histograms endpoint for the dashboard a misnomer?


Brian, could we revert your patch cleanly while we investigate, or would the revert cause problems?
It would cause problems with other tests that rely on it in order to upload successfully, although there are probably fewer of those than the number of tests that got broken by this.

This might be something that has to be dealt with eventually since AFAIK the plan is to eventually remove all non-histograms data formats from use, so the results will eventually need to be reported in ms (I tried adding micro and nanoseconds to histograms when adding my conversion script, but was told not to).

Looking at the logs from before my patch, it looks like it was still uploading to the old /add_point endpoint, which means all the LUCI builders on chromium.perf that run gtest perf tests have their IPs whitelisted.
Brian, your help getting my tests working again would be appreciated. I am planning to use the results in a presentation to customers next week.

I believe this script converts formats for upload:

https://cs.chromium.org/chromium/src/tools/perf/generate_legacy_perf_dashboard_json.py
Blocking: 920325
Labels: -Pri-1 Pri-0
Essentially this is P0 from perf perspective.
We can revert the patch temporarily if it's important to have the old results in the near future, but I think we're going to need to eat the one-time regression caused by the unit change at some point.

The conversion script is https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/value/gtest_json_converter.py - the "legacy" perf format seems to be unrelated.
Labels: -Pri-0 Pri-1
I see. I didn't realize the values were correct and are now represented in milliseconds. Is there some way I can be alerted to these kind of changes before they happen? e.g. a Google group or list I should follow?

The new dashboard was displaying times in the order of days and weeks:

https://v2spa-dot-chromeperf.appspot.com/#session=a2938445f94539692d908ba303cc15edc66e5af54ff5fb06d8d8a09dc4256d89

Maybe that's a separate bug. I was also not seeing the results on the win-10 Intel bot on the http://chromeperf.appspot.com dashboard. They were showing up on the new dashboard. But again maybe a separate bug. Should we close this bug out then as WontFix and address follow-ups? I'm not sure what dashboard I should be using for my results.


I'm seeing the Win 10 results on chromeperf, e.g. https://chromeperf.appspot.com/report?sid=16732c93ebe1929a1a925281178d49d263b31d1b903eba28c20dcbec8387f225.

Not sure what the difference between the two dashboards are - I've always used chromeperf.appspot.com, and didn't even know the other existed until now.
Aha, I see the issue.

"Some charts failed to load. Either they don't exist, or they are empty, or they are accessible only for signed-in google.com accounts."

I guess this is expected. Any plans to make any data from this bot public? There's nothing secret there for angle_perftests.

I guess maybe the only follow-up here is for me to file an issue about the new dashboard. Also the measurement and test case are inverted in the new dashboard as noticed in  issue 900677  .
v2spa appears to be rendering msBestFitFormat scalars as intended. 518 million ms is about 6 days. At some point, I might change the time auto-formatting to render "X days, X hours" or "X h, X min" or something instead of decimals.

On the dashboard side, we can edit old data to multiply/divide by some factor. Please let us know which timeseries to scale, between what revisions, and by what factor. Depending on the number of timeseries, it should only take a few minutes to hours.
Oh wait, I apparently can't convert time properly. The conversion factor is off (it's multiplying when it should be dividing)... I'll submit a fix.
Right. Derp. That multiply->divide should fix the other dashboard too. Should I file an issue to get the data in the dashboard adjusted?
#13: Yes, please file a crbug with Speed>Dashboard to scale values. Please include the timeseries test path patterns (or v2spa descriptors), scale factor (specify multiply or divide), and revision ranges.
For reference, I'd want everything in angle_perftests on the "Win 7 NVIDIA GPU Perf" and "android-nexus5x-perf" bots from revision 608300 to revision 620236 to be divided by one million.
Blocking: 921693
Filed issue 921693.
Project Member

Comment 18 by bugdroid1@chromium.org, Jan 14

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38

commit df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38
Author: bsheedy <bsheedy@chromium.org>
Date: Mon Jan 14 20:14:12 2019

Fix gtest conversion multiplier

Fixes the us/ns multipliers in gtest_json_converter.py. They were
accidentally multiplying by 1000/1000000 instead of dividing, so the
resulting ms result was far too large.

Bug: chromium:921342
Change-Id: Iac3c5bc79d56f35a17c8a63cf8dcaa739bef4d68
Reviewed-on: https://chromium-review.googlesource.com/c/1409529
Reviewed-by: Ben Hayden <benjhayden@chromium.org>
Commit-Queue: Brian Sheedy <bsheedy@chromium.org>

[modify] https://crrev.com/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38/tracing/tracing/value/gtest_json_converter_unittest.py
[modify] https://crrev.com/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38/tracing/tracing/value/gtest_json_converter.py

Blockedon: 922041
Project Member

Comment 20 by bugdroid1@chromium.org, Jan 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c9d39cb7402e371e19d1485f07270953c266d4cd

commit c9d39cb7402e371e19d1485f07270953c266d4cd
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Wed Jan 16 02:43:37 2019

Roll src/third_party/catapult 96320b515106..78448d90081e (17 commits)

https://chromium.googlesource.com/catapult.git/+log/96320b515106..78448d90081e


git log 96320b515106..78448d90081e --date=short --no-merges --format='%ad %ae %s'
2019-01-15 vollick@chromium.org Plumb the trace buffer size for atrace
2019-01-15 perezju@chromium.org Revert "[py_utils] Add modules_util.RequireVersion"
2019-01-15 taylori@google.com Fix handling of perfetto protobuf on mac
2019-01-15 perezju@chromium.org [dashboard] Remove dead code in start_try_job.py
2019-01-15 perezju@chromium.org Revert "Remove TagMap."
2019-01-15 perezju@chromium.org [Telemetry] Add list_of_scalar_values.mean
2019-01-15 dtu@chromium.org [pinpoint] Add Tags field to try job dialog.
2019-01-15 dtu@chromium.org [pinpoint] Ignore cached isolate hashes over 8 weeks old.
2019-01-14 eroman@chromium.org Fix import of new NetExport generated logs.
2019-01-14 bsheedy@chromium.org Fix gtest conversion multiplier
2019-01-14 eyaich@google.com Adding timeToFirstViewportReady metric
2019-01-14 perezju@chromium.org [py_utils] Add modules_util.RequireVersion
2019-01-14 perezju@chromium.org Remove GetNetworkData methods
2019-01-14 perezju@chromium.org [cli services] Add buildbucket_service
2019-01-14 perezju@chromium.org [Telemetry] Remove TBMv1 metrics.SmoothnessMetric
2019-01-11 benjhayden@chromium.org Remove TagMap.
2019-01-11 benjhayden@chromium.org Truncate serialized floats in HistogramSet JSON.


Created with:
  gclient setdep -r src/third_party/catapult@78448d90081e

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:776709,chromium:777865,chromium:918218,chromium:921616,chromium:874940,chromium:916877, chromium:917273 ,chromium:921342,chromium:904879,chromium:776709,chromium:777865,chromium:480512, chromium:919093 , chromium:691581 ,chromium:921000,chromium:918218,chromium:918208
TBR=sullivan@chromium.org

Change-Id: I75c07d8719f5668f9b4a95a9757cf6c25412b547
Reviewed-on: https://chromium-review.googlesource.com/c/1413252
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#623062}
[modify] https://crrev.com/c9d39cb7402e371e19d1485f07270953c266d4cd/DEPS

Comment 21 by jmad...@chromium.org, Jan 16 (6 days ago)

Status: Fixed (was: Assigned)
Thanks. Seems to be fixed now. Will follow up in issue 921693 .

Comment 22 by jmad...@chromium.org, Jan 17 (5 days ago)

Status: Assigned (was: Fixed)
Brian, the sample values seem like they might be off by a factor of 2.

The benchmark outputs

[ RUN      ] DrawCallPerfBenchmark.Run/gl
*RESULT DrawCallPerf_gl: wall_time= 55.8361926001 ns
*RESULT DrawCallPerf_gl: wall_time= 58.5454059982 ns
*RESULT DrawCallPerf_gl: wall_time= 79.1849336105 ns
[       OK ] DrawCallPerfBenchmark.Run/gl (5478 ms)

But on the dashboard I see

Value: 0.000032261 (± 0.000010)

The correct value should be 64 ns.

I'm wondering if it might be easier if I output a score value instead of a time-based value. That way I also wouldn't have all those ugly leading zeros.

Comment 24 by bsheedy@chromium.org, Jan 17 (5 days ago)

Hm, that's strange. Not sure off hand what could be causing that, but since we have an example of JSON that we can test with, should be pretty easy to track down.

I'll try to diagnose and submit a fix today.

Comment 25 by bsheedy@chromium.org, Jan 17 (5 days ago)

Well, now I'm confused...

Running conversion script on the perf data from the swarming task you linked gets 5.59789310381e-05 for the average, which is what we expect. After the data is converted here (https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py?q=gtest_json_converter&sq=package:chromium&dr=C&l=101), the numerical data shouldn't be modified again before being uploaded to the dashboard.

The data is modified once more before uploading (https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py?q=gtest_json_converter&sq=package:chromium&dr=C&l=101), but that shouldn't be modifying any of the numerical data, only metadata-like information like what builder the results came from.

Comment 26 by bsheedy@chromium.org, Jan 17 (5 days ago)

The average I'm referring to in c#25 is for DrawCallPerf_d3d11, which has the following raw results:

*RESULT DrawCallPerf_d3d11: wall_time= 56.8253159747 ns

*RESULT DrawCallPerf_d3d11: wall_time= 55.4272671712 ns

*RESULT DrawCallPerf_d3d11: wall_time= 55.6842099683 ns

and this data in the gtest JSON:

"wall_time": ["55.9789310381", "0.607607601426"]

(an average of 55.98 and a standard deviation of 0.61)

Comment 27 by bsheedy@chromium.org, Jan 17 (5 days ago)

Ben, would it be possible to check what the actual data uploaded to the dashboard was? And related to that, does the dashboard perform any post-processing on the data that could be causing this?

Comment 28 by jmadill@google.com, Jan 18 (4 days ago)

Any chance of a update? I'm unsure what part of the process might be scaling the values.

Comment 29 by bsheedy@chromium.org, Jan 18 (4 days ago)

No updates on diagnosis from my end, but there are other bugs being filed that relate to the conversion to histograms, so I'm going to disable the conversion to get it back to the old behavior until the issues can get fixed.

Comment 30 by crouleau@chromium.org, Jan 18 (4 days ago)

Blocking: 923564
Project Member

Comment 31 by bugdroid1@chromium.org, Jan 19 (4 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f622e65f3176520f2b3c8a40dd67c14f7f7e603b

commit f622e65f3176520f2b3c8a40dd67c14f7f7e603b
Author: bsheedy <bsheedy@chromium.org>
Date: Sat Jan 19 00:21:34 2019

Disable gtest to histogram conversion

Disables the automatic conversion from gtest perf test JSON to
histograms on LUCI builders, as there are issues with data getting
halved for no obvious reason and the assumption that smaller is better
not working for some metrics.

Bug: 921342, 923536
Change-Id: I88b80e9d8d38798c413d46420c16266951861085
Reviewed-on: https://chromium-review.googlesource.com/c/1409693
Commit-Queue: Brian Sheedy <bsheedy@chromium.org>
Reviewed-by: Caleb Rouleau <crouleau@chromium.org>
Cr-Commit-Position: refs/heads/master@{#624375}
[modify] https://crrev.com/f622e65f3176520f2b3c8a40dd67c14f7f7e603b/tools/perf/process_perf_results.py

Comment 32 by benjhayden@google.com, Jan 19 (3 days ago)

> Ben, would it be possible to check what the actual data uploaded to the dashboard was? And related to that, does the dashboard perform any post-processing on the data that could be causing this?

Histograms are merged in add_reserved_diagnostics before being uploaded.
The running statistics merging algorithms work as long as callers use the add() method to add samples, but gtest_json_converter goes around that method. RunningStatistics.merge() computes merged.mean = merged.sum / merged.count instead of (this.mean*this.count+other.mean*other.count)/merged.count for efficiency.
https://github.com/catapult-project/catapult/blob/master/tracing/tracing/base/math/running_statistics.html#L119
https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram.py#L305

Try setting sum = 2 * mean in gtest_json_converter.

LUCI's build step pages should link to the perf_results.json files.

Sign in to add a comment