Issue metadata
Sign in to add a comment
|
angle_perftests (and other gtest perf tests) steps on perf waterfall uploading incorrect measurements |
||||||||||||||||||||
Issue descriptionThis test suite was accidentally broken around Jan 7. It's being restored in issue 921039 . It looks like in the meantime a change has regressed the uploaded values. You can see by looking at the graph here: https://chromeperf.appspot.com/report?sid=d92cb2d6ce4eb5a1edb54aa473a5d27902cd316dcaa4c86ec20df53c14be22f9&start_rev=612117&end_rev=622295 Values in the range of tens of nanoseconds are being reported in the millions. It seems like a script that parses the results is getting incorrect values. It is also apparent in the 'outdir_json' steps: before: https://logs.chromium.org/logs/chrome/buildbucket/cr-buildbucket.appspot.com/8925032059613019136/+/steps/angle_perftests_on_NVIDIA_GPU_on_Windows_on_Windows-2008ServerR2-SP1/0/logs/outdir_json/0 "DrawCallPerf_gl_null_vbo_change\": {\"traces\": {\"wall_time\": [\"343.932724596\", \"0.0104743098631\"]}, \"units\": \"ns\", \"important\" after: https://logs.chromium.org/logs/chrome/buildbucket/cr-buildbucket.appspot.com/8924553411965986368/+/steps/angle_perftests_on_NVIDIA_GPU_on_Windows_on_Windows-2008ServerR2-SP1/0/logs/outdir_json/0 {\"sampleValues\": [1355142850.82], \"name\": \"DrawCallPerf_d3d11_vbo_change\", \"running\": [2, 1355142850.82, 0, 1355142850.82, 1355142850.82, bsheedy, could this be related to your change "Support gtest perf test uploading on LUCI"? https://crrev.com/c/1371008 Also seems related to issue 919503 .
,
Jan 14
Brian, could we revert your patch cleanly while we investigate, or would the revert cause problems?
,
Jan 14
It would cause problems with other tests that rely on it in order to upload successfully, although there are probably fewer of those than the number of tests that got broken by this. This might be something that has to be dealt with eventually since AFAIK the plan is to eventually remove all non-histograms data formats from use, so the results will eventually need to be reported in ms (I tried adding micro and nanoseconds to histograms when adding my conversion script, but was told not to). Looking at the logs from before my patch, it looks like it was still uploading to the old /add_point endpoint, which means all the LUCI builders on chromium.perf that run gtest perf tests have their IPs whitelisted.
,
Jan 14
Brian, your help getting my tests working again would be appreciated. I am planning to use the results in a presentation to customers next week. I believe this script converts formats for upload: https://cs.chromium.org/chromium/src/tools/perf/generate_legacy_perf_dashboard_json.py
,
Jan 14
,
Jan 14
Essentially this is P0 from perf perspective.
,
Jan 14
We can revert the patch temporarily if it's important to have the old results in the near future, but I think we're going to need to eat the one-time regression caused by the unit change at some point. The conversion script is https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/value/gtest_json_converter.py - the "legacy" perf format seems to be unrelated.
,
Jan 14
I see. I didn't realize the values were correct and are now represented in milliseconds. Is there some way I can be alerted to these kind of changes before they happen? e.g. a Google group or list I should follow? The new dashboard was displaying times in the order of days and weeks: https://v2spa-dot-chromeperf.appspot.com/#session=a2938445f94539692d908ba303cc15edc66e5af54ff5fb06d8d8a09dc4256d89 Maybe that's a separate bug. I was also not seeing the results on the win-10 Intel bot on the http://chromeperf.appspot.com dashboard. They were showing up on the new dashboard. But again maybe a separate bug. Should we close this bug out then as WontFix and address follow-ups? I'm not sure what dashboard I should be using for my results.
,
Jan 14
I'm seeing the Win 10 results on chromeperf, e.g. https://chromeperf.appspot.com/report?sid=16732c93ebe1929a1a925281178d49d263b31d1b903eba28c20dcbec8387f225. Not sure what the difference between the two dashboards are - I've always used chromeperf.appspot.com, and didn't even know the other existed until now.
,
Jan 14
Aha, I see the issue. "Some charts failed to load. Either they don't exist, or they are empty, or they are accessible only for signed-in google.com accounts." I guess this is expected. Any plans to make any data from this bot public? There's nothing secret there for angle_perftests. I guess maybe the only follow-up here is for me to file an issue about the new dashboard. Also the measurement and test case are inverted in the new dashboard as noticed in issue 900677 .
,
Jan 14
v2spa appears to be rendering msBestFitFormat scalars as intended. 518 million ms is about 6 days. At some point, I might change the time auto-formatting to render "X days, X hours" or "X h, X min" or something instead of decimals. On the dashboard side, we can edit old data to multiply/divide by some factor. Please let us know which timeseries to scale, between what revisions, and by what factor. Depending on the number of timeseries, it should only take a few minutes to hours.
,
Jan 14
Oh wait, I apparently can't convert time properly. The conversion factor is off (it's multiplying when it should be dividing)... I'll submit a fix.
,
Jan 14
Right. Derp. That multiply->divide should fix the other dashboard too. Should I file an issue to get the data in the dashboard adjusted?
,
Jan 14
#13: Yes, please file a crbug with Speed>Dashboard to scale values. Please include the timeseries test path patterns (or v2spa descriptors), scale factor (specify multiply or divide), and revision ranges.
,
Jan 14
For reference, I'd want everything in angle_perftests on the "Win 7 NVIDIA GPU Perf" and "android-nexus5x-perf" bots from revision 608300 to revision 620236 to be divided by one million.
,
Jan 14
,
Jan 14
Filed issue 921693.
,
Jan 14
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38 commit df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38 Author: bsheedy <bsheedy@chromium.org> Date: Mon Jan 14 20:14:12 2019 Fix gtest conversion multiplier Fixes the us/ns multipliers in gtest_json_converter.py. They were accidentally multiplying by 1000/1000000 instead of dividing, so the resulting ms result was far too large. Bug: chromium:921342 Change-Id: Iac3c5bc79d56f35a17c8a63cf8dcaa739bef4d68 Reviewed-on: https://chromium-review.googlesource.com/c/1409529 Reviewed-by: Ben Hayden <benjhayden@chromium.org> Commit-Queue: Brian Sheedy <bsheedy@chromium.org> [modify] https://crrev.com/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38/tracing/tracing/value/gtest_json_converter_unittest.py [modify] https://crrev.com/df10be7490c9ecf7fb2b0ddaa5fd8392e8d63b38/tracing/tracing/value/gtest_json_converter.py
,
Jan 15
,
Jan 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c9d39cb7402e371e19d1485f07270953c266d4cd commit c9d39cb7402e371e19d1485f07270953c266d4cd Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Date: Wed Jan 16 02:43:37 2019 Roll src/third_party/catapult 96320b515106..78448d90081e (17 commits) https://chromium.googlesource.com/catapult.git/+log/96320b515106..78448d90081e git log 96320b515106..78448d90081e --date=short --no-merges --format='%ad %ae %s' 2019-01-15 vollick@chromium.org Plumb the trace buffer size for atrace 2019-01-15 perezju@chromium.org Revert "[py_utils] Add modules_util.RequireVersion" 2019-01-15 taylori@google.com Fix handling of perfetto protobuf on mac 2019-01-15 perezju@chromium.org [dashboard] Remove dead code in start_try_job.py 2019-01-15 perezju@chromium.org Revert "Remove TagMap." 2019-01-15 perezju@chromium.org [Telemetry] Add list_of_scalar_values.mean 2019-01-15 dtu@chromium.org [pinpoint] Add Tags field to try job dialog. 2019-01-15 dtu@chromium.org [pinpoint] Ignore cached isolate hashes over 8 weeks old. 2019-01-14 eroman@chromium.org Fix import of new NetExport generated logs. 2019-01-14 bsheedy@chromium.org Fix gtest conversion multiplier 2019-01-14 eyaich@google.com Adding timeToFirstViewportReady metric 2019-01-14 perezju@chromium.org [py_utils] Add modules_util.RequireVersion 2019-01-14 perezju@chromium.org Remove GetNetworkData methods 2019-01-14 perezju@chromium.org [cli services] Add buildbucket_service 2019-01-14 perezju@chromium.org [Telemetry] Remove TBMv1 metrics.SmoothnessMetric 2019-01-11 benjhayden@chromium.org Remove TagMap. 2019-01-11 benjhayden@chromium.org Truncate serialized floats in HistogramSet JSON. Created with: gclient setdep -r src/third_party/catapult@78448d90081e The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:776709,chromium:777865,chromium:918218,chromium:921616,chromium:874940,chromium:916877, chromium:917273 ,chromium:921342,chromium:904879,chromium:776709,chromium:777865,chromium:480512, chromium:919093 , chromium:691581 ,chromium:921000,chromium:918218,chromium:918208 TBR=sullivan@chromium.org Change-Id: I75c07d8719f5668f9b4a95a9757cf6c25412b547 Reviewed-on: https://chromium-review.googlesource.com/c/1413252 Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#623062} [modify] https://crrev.com/c9d39cb7402e371e19d1485f07270953c266d4cd/DEPS
,
Jan 16
(6 days ago)
,
Jan 17
(5 days ago)
Brian, the sample values seem like they might be off by a factor of 2. The benchmark outputs [ RUN ] DrawCallPerfBenchmark.Run/gl *RESULT DrawCallPerf_gl: wall_time= 55.8361926001 ns *RESULT DrawCallPerf_gl: wall_time= 58.5454059982 ns *RESULT DrawCallPerf_gl: wall_time= 79.1849336105 ns [ OK ] DrawCallPerfBenchmark.Run/gl (5478 ms) But on the dashboard I see Value: 0.000032261 (± 0.000010) The correct value should be 64 ns. I'm wondering if it might be easier if I output a score value instead of a time-based value. That way I also wouldn't have all those ugly leading zeros.
,
Jan 17
(5 days ago)
Hm, that's strange. Not sure off hand what could be causing that, but since we have an example of JSON that we can test with, should be pretty easy to track down. I'll try to diagnose and submit a fix today.
,
Jan 17
(5 days ago)
Well, now I'm confused... Running conversion script on the perf data from the swarming task you linked gets 5.59789310381e-05 for the average, which is what we expect. After the data is converted here (https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py?q=gtest_json_converter&sq=package:chromium&dr=C&l=101), the numerical data shouldn't be modified again before being uploaded to the dashboard. The data is modified once more before uploading (https://cs.chromium.org/chromium/src/tools/perf/process_perf_results.py?q=gtest_json_converter&sq=package:chromium&dr=C&l=101), but that shouldn't be modifying any of the numerical data, only metadata-like information like what builder the results came from.
,
Jan 17
(5 days ago)
The average I'm referring to in c#25 is for DrawCallPerf_d3d11, which has the following raw results: *RESULT DrawCallPerf_d3d11: wall_time= 56.8253159747 ns *RESULT DrawCallPerf_d3d11: wall_time= 55.4272671712 ns *RESULT DrawCallPerf_d3d11: wall_time= 55.6842099683 ns and this data in the gtest JSON: "wall_time": ["55.9789310381", "0.607607601426"] (an average of 55.98 and a standard deviation of 0.61)
,
Jan 17
(5 days ago)
Ben, would it be possible to check what the actual data uploaded to the dashboard was? And related to that, does the dashboard perform any post-processing on the data that could be causing this?
,
Jan 18
(4 days ago)
Any chance of a update? I'm unsure what part of the process might be scaling the values.
,
Jan 18
(4 days ago)
No updates on diagnosis from my end, but there are other bugs being filed that relate to the conversion to histograms, so I'm going to disable the conversion to get it back to the old behavior until the issues can get fixed.
,
Jan 18
(4 days ago)
,
Jan 19
(4 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f622e65f3176520f2b3c8a40dd67c14f7f7e603b commit f622e65f3176520f2b3c8a40dd67c14f7f7e603b Author: bsheedy <bsheedy@chromium.org> Date: Sat Jan 19 00:21:34 2019 Disable gtest to histogram conversion Disables the automatic conversion from gtest perf test JSON to histograms on LUCI builders, as there are issues with data getting halved for no obvious reason and the assumption that smaller is better not working for some metrics. Bug: 921342, 923536 Change-Id: I88b80e9d8d38798c413d46420c16266951861085 Reviewed-on: https://chromium-review.googlesource.com/c/1409693 Commit-Queue: Brian Sheedy <bsheedy@chromium.org> Reviewed-by: Caleb Rouleau <crouleau@chromium.org> Cr-Commit-Position: refs/heads/master@{#624375} [modify] https://crrev.com/f622e65f3176520f2b3c8a40dd67c14f7f7e603b/tools/perf/process_perf_results.py
,
Jan 19
(3 days ago)
> Ben, would it be possible to check what the actual data uploaded to the dashboard was? And related to that, does the dashboard perform any post-processing on the data that could be causing this? Histograms are merged in add_reserved_diagnostics before being uploaded. The running statistics merging algorithms work as long as callers use the add() method to add samples, but gtest_json_converter goes around that method. RunningStatistics.merge() computes merged.mean = merged.sum / merged.count instead of (this.mean*this.count+other.mean*other.count)/merged.count for efficiency. https://github.com/catapult-project/catapult/blob/master/tracing/tracing/base/math/running_statistics.html#L119 https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram.py#L305 Try setting sum = 2 * mean in gtest_json_converter. LUCI's build step pages should link to the perf_results.json files. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by bsheedy@chromium.org
, Jan 14