~15% regression in passthrough command decoder perftests. |
|||||||||
Issue description
,
Aug 13
,
Aug 13
Ned: do you know how we can get alerts set up for these tests? Also I'm suspecting this change: jie.a.chen@intel.com ParallelCompile: Parallelize D3D linking Just a heads up to Jie that we might need to do some perf testing with your CL. You can use angle_perftests --gtest_filter=DrawCall*gl_null to test before/after your patch.
,
Aug 13
#3: you would file a bug to monitor the benchmark with Components "Speed>Dashboard"
,
Aug 14
Geoff, I did the testing as you suggested. Theoretically my patch may cause some thread overhead if linking lots of simple shaders. But in this case, there is only 1 Program::link() called. So I think it's irrelevant to the regression. // Before my patch. c:\workspace\angle>out\Debug\angle_perftests --gtest_filter=DrawCall*gl_null WARN: rx::`anonymous-namespace'::GetDesiredPresentMode(47): Present mode 1 not available. Falling back to 0 ERR: egl::Display::initialize(478): ANGLE Display::initialize error 12289: Intel OpenGL ES drivers are not supported. Skipping tests using configuration ES2_OPENGLES because it is not available. Note: Google Test filter = DrawCall*gl_null [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DrawCallPerfBenchmark [ RUN ] DrawCallPerfBenchmark.Run/gl_null *RESULT DrawCallPerf_gl_null: score= 11366 score [ OK ] DrawCallPerfBenchmark.Run/gl_null (10039 ms) [----------] 1 test from DrawCallPerfBenchmark (10047 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (10067 ms total) [ PASSED ] 1 test. // After my patch. c:\workspace\angle>out\Debug\angle_perftests --gtest_filter=DrawCall*gl_null WARN: rx::`anonymous-namespace'::GetDesiredPresentMode(47): Present mode 1 not available. Falling back to 0 ERR: egl::Display::initialize(478): ANGLE Display::initialize error 12289: Intel OpenGL ES drivers are not supported. Skipping tests using configuration ES2_OPENGLES because it is not available. Note: Google Test filter = DrawCall*gl_null [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DrawCallPerfBenchmark [ RUN ] DrawCallPerfBenchmark.Run/gl_null *RESULT DrawCallPerf_gl_null: score= 11212 score [ OK ] DrawCallPerfBenchmark.Run/gl_null (10053 ms) [----------] 1 test from DrawCallPerfBenchmark (10079 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (10110 ms total) [ PASSED ] 1 test.
,
Aug 14
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14116898640000
,
Aug 14
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/14116898640000 All of the runs failed. The most common error (1/20 runs) was: IOError: [Errno 2] No such file or directory: 'c:\\b\\s\\w\\itr03n5k\\tmpgtydtgtelemetry\\histograms.json'
,
Aug 15
Thanks for your answers in #4 Ned.
Do you have any idea why the tests would fail with this error?
Traceback (most recent call last):
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-501bed82.411835512812192005/dashboard/pinpoint/models/quest/execution.py", line 95, in Poll
self._Poll()
File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-501bed82.411835512812192005/dashboard/pinpoint/models/quest/run_test.py", line 211, in _Poll
'message was:\n%s' % exception_string)
SwarmingTestError: The test failed. The test's error message was:
IOError: [Errno 2] No such file or directory: 'c:\\b\\s\\w\\it7lyfou\\tmp5mlqiqtelemetry\\histograms.json'
,
Aug 15
+Dave: can you look into #9
,
Aug 15
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1249a382640000
,
Aug 15
The regression showed up as well in the angle_perftests.
,
Aug 15
I look at the swarming task log. It was due to the commandline was not set up properly: c:\infra-system\bin\python.exe ..\..\testing\scripts\run_performance_tests.py ../../tools/perf/run_benchmark --benchmarks passthrough_command_buffer_perftests --story-filter wall.time --pageset-repeat 1 --browser release_x64 -v --upload-results --output-format histograms --isolated-script-test-output c:\b\s\w\ioyplhwt\output.json --isolated-script-test-chartjson-output c:\b\s\w\ioyplhwt\chartjson-output.json --results-label chromium@3c85414 "passthrough_command_buffer_perftests" benchmark is a special case which the benchmark command should still be command_buffer_perftests. The "passthrough_" part is due to the extra flags to be passed into it.
,
Aug 15
jmadill@, at this point, I think we should reconsider building a passthrough_command_buffer_perftests binary that is basically the same as command_buffer_perftests but with the extra flags set. The fact that the benchmark name isn't the same as binary name once caused issue 870692 . And now it also makes it difficult for bisection to work correctly.
,
Aug 15
,
Aug 15
I agree. I think for now it's not a big deal if the bisect doesn't work. We can find the same regression from angle_perftests. Rewriting the tests to handle the multiple configs would require some trickiness with multiple process launching internally in the test. I'm not very expert on this. So maybe we can defer that task for later?
,
Aug 16
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/1249a382640000 Roll src/third_party/angle ea926a362b77..7ae70d8fb360 (1 commits) by angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com https://chromium.googlesource.com/chromium/src/+/13044802db6b6a9e3689e40d30bccb22f8930124 2.33e+05 → 2.106e+05 (-2.236e+04) Assigning to sheriff ynovikov@chromium.org because "Roll src/third_party/angle ea926a362b77..7ae70d8fb360 (1 commits)" is a roll. Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Aug 16
#16: I am not proposing we merge the binary. Instead, make separate binaries: command_buffer_perftests binary --> run command_buffer_perftests as before passthrough_command_buffer_perftests binary --> run command_buffer_perftests with --passthrough flag.
,
Aug 16
ned: okay, understood. I still don't know how to implement that, are you okay if we defer the fixing for later? we can get bisect coverage using angle_perftests. The existing test setup gives us a very useful dashboard with a clear history. Looks like the regression pinpointed in #17 is jie.a.chen@intel.com ParallelCompile: Parallelize D3D linking Jie - can you try re-testing? I suggest building with gn args: is_debug = false target_cpu = "x64" and running scripts/perf_test_runner.py DrawCall*gl_null to produce a high number of iterations and reduce variance. See https://chromium.googlesource.com/angle/angle/+/master/src/tests/perf_tests/README.md
,
Aug 16
Jamie - I have re-tested as you suggested. It seems to me the difference is trivial. Anyway I will double check the patch. // Before the patch c:\workspace\angle>python scripts\perf_test_runner.py DrawCall*gl_null Using test executable: c:\workspace\angle\out\Release\angle_perftests.exe Test name: DrawCall*gl_null score: 288847, mean: 288847.0 score: 293156, mean: 291001.5, stddev: 2154.5 score: 293878, mean: 291960.333333, stddev: 2221.10397976 score: 295307, mean: 292797.0, stddev: 2408.3210957 score: 295050, mean: 293247.6, stddev: 2334.98818841 score: 291564, mean: 292967.0, stddev: 2221.97134695 score: 294563, mean: 293195.0, stddev: 2131.6079779 score: 295320, mean: 293460.625, stddev: 2114.16176873, truncated mean: 293919.666667, stddev: 1274.77536147 score: 295141, mean: 293647.333333, stddev: 2062.02058401, truncated mean: 294094.142857, stddev: 1255.21056043 score: 296962, mean: 293978.8, stddev: 2194.44010171, truncated mean: 294247.375, stddev: 1242.16342901 score: 296575, mean: 294214.818182, stddev: 2221.44870087, truncated mean: 294506.0, stddev: 1380.80556198 score: 296667, mean: 294419.166667, stddev: 2232.24986778, truncated mean: 294722.1, stddev: 1461.59313422 score: 297254, mean: 294637.230769, stddev: 2273.82054477, truncated mean: 294925.727273, stddev: 1535.15152293 score: 294653, mean: 294638.357143, stddev: 2191.11206622, truncated mean: 294903.0, stddev: 1471.7272732 score: 297129, mean: 294804.4, stddev: 2206.10301361, truncated mean: 295074.230769, stddev: 1533.36458377 score: 296435, mean: 294906.3125, stddev: 2172.21136514, truncated mean: 295308.916667, stddev: 1125.93135214 score: 296299, mean: 294988.235294, stddev: 2132.68018422, truncated mean: 295385.076923, stddev: 1113.46710507 score: 296603, mean: 295077.944444, stddev: 2105.33880811, truncated mean: 295472.071429, stddev: 1117.87109302 score: 295017, mean: 295074.736842, stddev: 2049.23150787, truncated mean: 295441.733333, stddev: 1085.91543358 score: 296264, mean: 295134.2, stddev: 2014.09129882, truncated mean: 295493.125, stddev: 1070.10658786 score: 296179, mean: 295183.952381, stddev: 1978.10516539, truncated mean: 295533.470588, stddev: 1050.62452674 score: 295466, mean: 295196.772727, stddev: 1933.5181766, truncated mean: 295529.722222, stddev: 1021.14047383 score: 295824, mean: 295224.043478, stddev: 1895.33919135, truncated mean: 295545.210526, stddev: 996.075011916 score: 279560, mean: 294571.375, stddev: 3638.69239119, truncated mean: 295466.5, stddev: 964.144252116 score: 293172, mean: 294515.4, stddev: 3575.70634141, truncated mean: 295345.736842, stddev: 1069.1849745 score: 294388, mean: 294510.5, stddev: 3506.35413598, truncated mean: 295297.85, stddev: 1062.8116143 // After the patch c:\workspace\angle>python scripts\perf_test_runner.py DrawCall*gl_null Using test executable: c:\workspace\angle\out\parallelRelease\angle_perftests.exe Test name: DrawCall*gl_null score: 288610, mean: 288610.0 score: 277381, mean: 282995.5, stddev: 5614.5 score: 288446, mean: 284812.333333, stddev: 5255.17271098 score: 289149, mean: 285896.5, stddev: 4923.299935 score: 288104, mean: 286338.0, stddev: 4491.19079978 score: 286116, mean: 286301.0, stddev: 4100.71221456 score: 289333, mean: 286734.142857, stddev: 3941.9860632 score: 289329, mean: 287058.5, stddev: 3785.93452796, truncated mean: 288292.333333, stddev: 1057.11693876 score: 289797, mean: 287362.777778, stddev: 3671.70138515, truncated mean: 288441.0, stddev: 1044.25311655 score: 288300, mean: 287456.5, stddev: 3494.61108709, truncated mean: 288423.375, stddev: 977.921768024 score: 290250, mean: 287710.454545, stddev: 3427.39347991, truncated mean: 288576.0, stddev: 1018.0506427 score: 289526, mean: 287861.75, stddev: 3319.62357618, truncated mean: 288671.0, stddev: 1006.98033744 score: 291121, mean: 288112.461538, stddev: 3305.52436467, truncated mean: 288814.545455, stddev: 1062.01680894 score: 290484, mean: 288281.857143, stddev: 3243.31018862, truncated mean: 288953.666667, stddev: 1116.59820686 score: 290039, mean: 288399.0, stddev: 3163.84331681, truncated mean: 289037.153846, stddev: 1111.09217551 score: 291292, mean: 288579.8125, stddev: 3142.40081822, truncated mean: 289280.583333, stddev: 753.027274665 score: 288640, mean: 288583.352941, stddev: 3048.60947938, truncated mean: 289231.307692, stddev: 743.349114027 score: 286121, mean: 288446.555556, stddev: 3015.92692997, truncated mean: 289009.142857, stddev: 1074.58948023 score: 290484, mean: 288553.789474, stddev: 2970.53398671, truncated mean: 289107.466667, stddev: 1101.4107842 score: 289307, mean: 288591.45, stddev: 2899.96837009, truncated mean: 289119.9375, stddev: 1067.52959378
,
Aug 17
Jamie, I have found that this trivial regression about 2% is from the overhead of Program::resolveLink in my patch. I managed to relieve it to less than 1%. Anyway I don't think it should take the blame of this 15% regression.
,
Aug 17
Jie: thanks. Let's try landing your CL and watching the bots. I agree one inlined function should not make a 15% difference. I can investigate more myself when I get back from travel.
,
Aug 17
The following revision refers to this bug: https://chromium.googlesource.com/angle/angle/+/5055fba5692f8b3904207ec47ab0a8e340341063 commit 5055fba5692f8b3904207ec47ab0a8e340341063 Author: jchen10 <jie.a.chen@intel.com> Date: Fri Aug 17 23:10:36 2018 Optimize Program::resolveLink The method has to be extremely fast as it's very frequently called. It contributes about 2% cpu time in the DrawCall/gl_null benchmark. With this optimization it can be decreased to less than 1%. Bug: chromium:873724 Change-Id: I7fb376db73452dbdf6cb44c92815848e860867c9 Reviewed-on: https://chromium-review.googlesource.com/1179369 Reviewed-by: Geoff Lang <geofflang@chromium.org> Commit-Queue: Jie A Chen <jie.a.chen@intel.com> [modify] https://crrev.com/5055fba5692f8b3904207ec47ab0a8e340341063/src/libANGLE/Program.h [modify] https://crrev.com/5055fba5692f8b3904207ec47ab0a8e340341063/src/libANGLE/Program.cpp
,
Aug 20
Graph didn't recover much, we need to investigate this more. The regression range is only 4 chrome CLs so I'm fairly certain it was this CL but it could be an issue with the benchmark itself, possibly deferring the program linking into the body of the perftest.
,
Aug 20
Jie: I was able to quite easily reproduce the performance regression in ANGLE standalone with your CL. Is it possible you are not using target_cpu = "x64" in your testing? The regression might only affect 64-bit.
,
Aug 21
Jamie, I have been always using "x64". May I know more platform information about your machine? Was it Linux, Win10, or Win7? I will try to find some other machines to reproduce.
,
Aug 21
I reproduced it on my Ubuntu desktop. Having done some profiling, I found the main cause is that some frequently called methods were made no longer inline in my CL. With my newly uploaded patch, it can improve 10% roughly.
,
Aug 21
Great! Thank you for investigating. Let's try landing your CL and watching the bots.
,
Aug 21
To answer your earlier question, I was using Windows 10. I can provide more info if needed.
,
Aug 22
The following revision refers to this bug: https://chromium.googlesource.com/angle/angle/+/87498164675dde4c3fb4179a8adab74b4980fcaf commit 87498164675dde4c3fb4179a8adab74b4980fcaf Author: jchen10 <jie.a.chen@intel.com> Date: Wed Aug 22 02:45:01 2018 Make some Program methods inlined These methods are very hot in the DrawCall/gl_null bechmark. With this CL applied, the score can improve about 10% on Linux. This also removes a few unnecessary resolveLink calls. Bug: chromium:873724 Change-Id: I6034f29eeeebe8341dae3988c38196123687a44f Reviewed-on: https://chromium-review.googlesource.com/1183522 Commit-Queue: Jie A Chen <jie.a.chen@intel.com> Reviewed-by: Geoff Lang <geofflang@chromium.org> Reviewed-by: Jamie Madill <jmadill@chromium.org> [modify] https://crrev.com/87498164675dde4c3fb4179a8adab74b4980fcaf/src/libANGLE/Program.h [modify] https://crrev.com/87498164675dde4c3fb4179a8adab74b4980fcaf/src/libANGLE/Program.cpp
,
Aug 23
Jie's fix in #30 has landed and made it to the perf dashboards but there has been only slight recovery in https://chromeperf.appspot.com/report?sid=3b11bf2aa2cf2281472588fed4dfc1e0a9f514ff10bd9db09cb208c5306ac3c9 or in https://chromeperf.appspot.com/report?sid=4f05681c83914fc3b8b5dacb47f0dfd17e36cdcc3598a753b9361cde6cd9b785 . I've tested Jie's change repeatedly locally. It seems there may be more time being spent in StateManagerGL::setGenericShaderState but it's not clear why. Closing this as fixed as Jie made several CLs that addressed the problem at least partially.
,
Sep 26
,
Sep 26
,
Dec 18
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by geoffl...@chromium.org
, Aug 13