New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 873724 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 23
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug


Sign in to add a comment

~15% regression in passthrough command decoder perftests.

Project Member Reported by geoffl...@chromium.org, Aug 13

Issue description

Cc: nednguyen@chromium.org
Ned, when trying to create a bisect of these perf tests, I'm met with the error "could not convert string to float: None" in the bisect dialog.  Could this be due to the fix in the test names from  issue 870692 ?
Blockedon: 873331
#1: that's  issue 873331 
Cc: jie.a.c...@intel.com
Ned: do you know how we can get alerts set up for these tests?

Also I'm suspecting this change:

jie.a.chen@intel.com ParallelCompile: Parallelize D3D linking

Just a heads up to Jie that we might need to do some perf testing with your CL. You can use angle_perftests --gtest_filter=DrawCall*gl_null to test before/after your patch.
#3: you would file a bug to monitor the benchmark with Components "Speed>Dashboard"
Blockedon: -873331 873022
Geoff, I did the testing as you suggested. Theoretically my patch may cause some thread overhead if linking lots of simple shaders. But in this case, there is only 1 Program::link() called. So I think it's irrelevant to the regression.

// Before my patch.
c:\workspace\angle>out\Debug\angle_perftests --gtest_filter=DrawCall*gl_null
WARN: rx::`anonymous-namespace'::GetDesiredPresentMode(47): Present mode 1 not available. Falling back to 0
ERR: egl::Display::initialize(478): ANGLE Display::initialize error 12289: Intel OpenGL ES drivers are not supported.
Skipping tests using configuration ES2_OPENGLES because it is not available.
Note: Google Test filter = DrawCall*gl_null
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DrawCallPerfBenchmark
[ RUN  	] DrawCallPerfBenchmark.Run/gl_null
*RESULT DrawCallPerf_gl_null: score= 11366 score
[   	OK ] DrawCallPerfBenchmark.Run/gl_null (10039 ms)
[----------] 1 test from DrawCallPerfBenchmark (10047 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (10067 ms total)
[  PASSED  ] 1 test.




// After my patch.
c:\workspace\angle>out\Debug\angle_perftests --gtest_filter=DrawCall*gl_null
WARN: rx::`anonymous-namespace'::GetDesiredPresentMode(47): Present mode 1 not available. Falling back to 0
ERR: egl::Display::initialize(478): ANGLE Display::initialize error 12289: Intel OpenGL ES drivers are not supported.
Skipping tests using configuration ES2_OPENGLES because it is not available.
Note: Google Test filter = DrawCall*gl_null
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DrawCallPerfBenchmark
[ RUN  	] DrawCallPerfBenchmark.Run/gl_null
*RESULT DrawCallPerf_gl_null: score= 11212 score
[   	OK ] DrawCallPerfBenchmark.Run/gl_null (10053 ms)
[----------] 1 test from DrawCallPerfBenchmark (10079 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (10110 ms total)
[  PASSED  ] 1 test.

😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/14116898640000

All of the runs failed. The most common error (1/20 runs) was:
IOError: [Errno 2] No such file or directory: 'c:\\b\\s\\w\\itr03n5k\\tmpgtydtgtelemetry\\histograms.json'
Thanks for your answers in #4 Ned.

Do you have any idea why the tests would fail with this error?

Traceback (most recent call last):
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-501bed82.411835512812192005/dashboard/pinpoint/models/quest/execution.py", line 95, in Poll
    self._Poll()
  File "/base/data/home/apps/s~chromeperf/pinpoint:clean-dtu-501bed82.411835512812192005/dashboard/pinpoint/models/quest/run_test.py", line 211, in _Poll
    'message was:\n%s' % exception_string)
SwarmingTestError: The test failed. The test's error message was:
IOError: [Errno 2] No such file or directory: 'c:\\b\\s\\w\\it7lyfou\\tmp5mlqiqtelemetry\\histograms.json'
Cc: dtu@chromium.org
+Dave: can you look into #9
The regression showed up as well in the angle_perftests. 
I look at the swarming task log. It was due to the commandline was not set up properly:
c:\infra-system\bin\python.exe ..\..\testing\scripts\run_performance_tests.py ../../tools/perf/run_benchmark --benchmarks passthrough_command_buffer_perftests --story-filter wall.time --pageset-repeat 1 --browser release_x64 -v --upload-results --output-format histograms --isolated-script-test-output c:\b\s\w\ioyplhwt\output.json --isolated-script-test-chartjson-output c:\b\s\w\ioyplhwt\chartjson-output.json --results-label chromium@3c85414


"passthrough_command_buffer_perftests" benchmark is a special case which the benchmark command should still be command_buffer_perftests. The  "passthrough_" part is due to the extra flags to be passed into it.
 jmadill@, at this point, I think we should reconsider building a passthrough_command_buffer_perftests binary that is basically the same as command_buffer_perftests but with the extra flags set. 

The fact that the benchmark name isn't the same as binary name once caused  issue 870692 . And now it also makes it difficult for bisection to work correctly.
Cc: jmad...@chromium.org
I agree. I think for now it's not a big deal if the bisect doesn't work. We can find the same regression from angle_perftests.

Rewriting the tests to handle the multiple configs would require some trickiness with multiple process launching internally in the test. I'm not very expert on this. So maybe we can defer that task for later?
Cc: angle-ch...@skia-buildbots.google.com.iam.gserviceaccount.com
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/1249a382640000

Roll src/third_party/angle ea926a362b77..7ae70d8fb360 (1 commits) by angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com
https://chromium.googlesource.com/chromium/src/+/13044802db6b6a9e3689e40d30bccb22f8930124
2.33e+05 → 2.106e+05 (-2.236e+04)

Assigning to sheriff ynovikov@chromium.org because "Roll src/third_party/angle ea926a362b77..7ae70d8fb360 (1 commits)" is a roll.

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
#16: I am not proposing we merge the binary. Instead, make separate binaries:
command_buffer_perftests binary --> run command_buffer_perftests as before
passthrough_command_buffer_perftests binary --> run command_buffer_perftests with --passthrough flag.

ned: okay, understood. I still don't know how to implement that, are you okay if we defer the fixing for later? we can get bisect coverage using angle_perftests. The existing test setup gives us a very useful dashboard with a clear history.

Looks like the regression pinpointed in #17 is jie.a.chen@intel.com ParallelCompile: Parallelize D3D linking

Jie - can you try re-testing? I suggest building with gn args:

is_debug = false
target_cpu = "x64"

and running scripts/perf_test_runner.py DrawCall*gl_null to produce a high number of iterations and reduce variance. See https://chromium.googlesource.com/angle/angle/+/master/src/tests/perf_tests/README.md
Jamie - I have re-tested as you suggested. It seems to me the difference is trivial. Anyway I will double check the patch.

// Before the patch
c:\workspace\angle>python scripts\perf_test_runner.py DrawCall*gl_null
Using test executable: c:\workspace\angle\out\Release\angle_perftests.exe
Test name: DrawCall*gl_null
score: 288847, mean: 288847.0
score: 293156, mean: 291001.5, stddev: 2154.5
score: 293878, mean: 291960.333333, stddev: 2221.10397976
score: 295307, mean: 292797.0, stddev: 2408.3210957
score: 295050, mean: 293247.6, stddev: 2334.98818841
score: 291564, mean: 292967.0, stddev: 2221.97134695
score: 294563, mean: 293195.0, stddev: 2131.6079779
score: 295320, mean: 293460.625, stddev: 2114.16176873, truncated mean: 293919.666667, stddev: 1274.77536147
score: 295141, mean: 293647.333333, stddev: 2062.02058401, truncated mean: 294094.142857, stddev: 1255.21056043
score: 296962, mean: 293978.8, stddev: 2194.44010171, truncated mean: 294247.375, stddev: 1242.16342901
score: 296575, mean: 294214.818182, stddev: 2221.44870087, truncated mean: 294506.0, stddev: 1380.80556198
score: 296667, mean: 294419.166667, stddev: 2232.24986778, truncated mean: 294722.1, stddev: 1461.59313422
score: 297254, mean: 294637.230769, stddev: 2273.82054477, truncated mean: 294925.727273, stddev: 1535.15152293
score: 294653, mean: 294638.357143, stddev: 2191.11206622, truncated mean: 294903.0, stddev: 1471.7272732
score: 297129, mean: 294804.4, stddev: 2206.10301361, truncated mean: 295074.230769, stddev: 1533.36458377
score: 296435, mean: 294906.3125, stddev: 2172.21136514, truncated mean: 295308.916667, stddev: 1125.93135214
score: 296299, mean: 294988.235294, stddev: 2132.68018422, truncated mean: 295385.076923, stddev: 1113.46710507
score: 296603, mean: 295077.944444, stddev: 2105.33880811, truncated mean: 295472.071429, stddev: 1117.87109302
score: 295017, mean: 295074.736842, stddev: 2049.23150787, truncated mean: 295441.733333, stddev: 1085.91543358
score: 296264, mean: 295134.2, stddev: 2014.09129882, truncated mean: 295493.125, stddev: 1070.10658786
score: 296179, mean: 295183.952381, stddev: 1978.10516539, truncated mean: 295533.470588, stddev: 1050.62452674
score: 295466, mean: 295196.772727, stddev: 1933.5181766, truncated mean: 295529.722222, stddev: 1021.14047383
score: 295824, mean: 295224.043478, stddev: 1895.33919135, truncated mean: 295545.210526, stddev: 996.075011916
score: 279560, mean: 294571.375, stddev: 3638.69239119, truncated mean: 295466.5, stddev: 964.144252116
score: 293172, mean: 294515.4, stddev: 3575.70634141, truncated mean: 295345.736842, stddev: 1069.1849745
score: 294388, mean: 294510.5, stddev: 3506.35413598, truncated mean: 295297.85, stddev: 1062.8116143



// After the patch
c:\workspace\angle>python scripts\perf_test_runner.py DrawCall*gl_null
Using test executable: c:\workspace\angle\out\parallelRelease\angle_perftests.exe
Test name: DrawCall*gl_null
score: 288610, mean: 288610.0
score: 277381, mean: 282995.5, stddev: 5614.5
score: 288446, mean: 284812.333333, stddev: 5255.17271098
score: 289149, mean: 285896.5, stddev: 4923.299935
score: 288104, mean: 286338.0, stddev: 4491.19079978
score: 286116, mean: 286301.0, stddev: 4100.71221456
score: 289333, mean: 286734.142857, stddev: 3941.9860632
score: 289329, mean: 287058.5, stddev: 3785.93452796, truncated mean: 288292.333333, stddev: 1057.11693876
score: 289797, mean: 287362.777778, stddev: 3671.70138515, truncated mean: 288441.0, stddev: 1044.25311655
score: 288300, mean: 287456.5, stddev: 3494.61108709, truncated mean: 288423.375, stddev: 977.921768024
score: 290250, mean: 287710.454545, stddev: 3427.39347991, truncated mean: 288576.0, stddev: 1018.0506427
score: 289526, mean: 287861.75, stddev: 3319.62357618, truncated mean: 288671.0, stddev: 1006.98033744
score: 291121, mean: 288112.461538, stddev: 3305.52436467, truncated mean: 288814.545455, stddev: 1062.01680894
score: 290484, mean: 288281.857143, stddev: 3243.31018862, truncated mean: 288953.666667, stddev: 1116.59820686
score: 290039, mean: 288399.0, stddev: 3163.84331681, truncated mean: 289037.153846, stddev: 1111.09217551
score: 291292, mean: 288579.8125, stddev: 3142.40081822, truncated mean: 289280.583333, stddev: 753.027274665
score: 288640, mean: 288583.352941, stddev: 3048.60947938, truncated mean: 289231.307692, stddev: 743.349114027
score: 286121, mean: 288446.555556, stddev: 3015.92692997, truncated mean: 289009.142857, stddev: 1074.58948023
score: 290484, mean: 288553.789474, stddev: 2970.53398671, truncated mean: 289107.466667, stddev: 1101.4107842
score: 289307, mean: 288591.45, stddev: 2899.96837009, truncated mean: 289119.9375, stddev: 1067.52959378

Jamie, I have found that this trivial regression about 2%  is from the overhead of Program::resolveLink in my patch. I managed to relieve it to less than 1%. Anyway I don't think it should take the blame of this 15% regression.
Jie: thanks. Let's try landing your CL and watching the bots. I agree one inlined function should not make a 15% difference. I can investigate more myself when I get back from travel.
Project Member

Comment 23 by bugdroid1@chromium.org, Aug 17

The following revision refers to this bug:
  https://chromium.googlesource.com/angle/angle/+/5055fba5692f8b3904207ec47ab0a8e340341063

commit 5055fba5692f8b3904207ec47ab0a8e340341063
Author: jchen10 <jie.a.chen@intel.com>
Date: Fri Aug 17 23:10:36 2018

Optimize Program::resolveLink

The method has to be extremely fast as it's very frequently called. It
contributes about 2% cpu time in the DrawCall/gl_null benchmark. With
this optimization it can be decreased to less than 1%.

Bug:  chromium:873724 

Change-Id: I7fb376db73452dbdf6cb44c92815848e860867c9
Reviewed-on: https://chromium-review.googlesource.com/1179369
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Commit-Queue: Jie A Chen <jie.a.chen@intel.com>

[modify] https://crrev.com/5055fba5692f8b3904207ec47ab0a8e340341063/src/libANGLE/Program.h
[modify] https://crrev.com/5055fba5692f8b3904207ec47ab0a8e340341063/src/libANGLE/Program.cpp

Graph didn't recover much, we need to investigate this more.  The regression range is only 4 chrome CLs so I'm fairly certain it was this CL but it could be an issue with the benchmark itself, possibly deferring the program linking into the body of the perftest.
Jie: I was able to quite easily reproduce the performance regression in ANGLE standalone with your CL. Is it possible you are not using target_cpu = "x64" in your testing? The regression might only affect 64-bit.
Jamie, I have been always using "x64". May I know more platform information about your machine? Was it Linux, Win10, or Win7? I will try to find some other machines to reproduce.
I reproduced it on my Ubuntu desktop. Having done some profiling, I found the main cause is that some frequently called methods were made no longer inline in my CL. With my newly uploaded patch, it can improve 10% roughly. 
Great! Thank you for investigating. Let's try landing your CL and watching the bots.
To answer your earlier question, I was using Windows 10. I can provide more info if needed.
Project Member

Comment 30 by bugdroid1@chromium.org, Aug 22

The following revision refers to this bug:
  https://chromium.googlesource.com/angle/angle/+/87498164675dde4c3fb4179a8adab74b4980fcaf

commit 87498164675dde4c3fb4179a8adab74b4980fcaf
Author: jchen10 <jie.a.chen@intel.com>
Date: Wed Aug 22 02:45:01 2018

Make some Program methods inlined

These methods are very hot in the DrawCall/gl_null bechmark. With this
CL applied, the score can improve about 10% on Linux.

This also removes a few unnecessary resolveLink calls.

Bug:  chromium:873724 

Change-Id: I6034f29eeeebe8341dae3988c38196123687a44f
Reviewed-on: https://chromium-review.googlesource.com/1183522
Commit-Queue: Jie A Chen <jie.a.chen@intel.com>
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Reviewed-by: Jamie Madill <jmadill@chromium.org>

[modify] https://crrev.com/87498164675dde4c3fb4179a8adab74b4980fcaf/src/libANGLE/Program.h
[modify] https://crrev.com/87498164675dde4c3fb4179a8adab74b4980fcaf/src/libANGLE/Program.cpp

Status: Fixed (was: Assigned)
Jie's fix in #30 has landed and made it to the perf dashboards but there has been only slight recovery in https://chromeperf.appspot.com/report?sid=3b11bf2aa2cf2281472588fed4dfc1e0a9f514ff10bd9db09cb208c5306ac3c9 or in https://chromeperf.appspot.com/report?sid=4f05681c83914fc3b8b5dacb47f0dfd17e36cdcc3598a753b9361cde6cd9b785 . I've tested Jie's change repeatedly locally. It seems there may be more time being spent in StateManagerGL::setGenericShaderState but it's not clear why. Closing this as fixed as Jie made several CLs that addressed the problem at least partially.
Blocking: angleproject:2851
Blocking: 849576
Blockedon: angleproject:3031

Sign in to add a comment