New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 751028 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
OOO until 2019-01-24
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

The fps of Webgl aquarium with 4000 fishes drop on Windows

Reported by canx....@intel.com, Aug 1 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3159.5 Safari/537.36

Steps to reproduce the problem:
1. Enter to https://webglsamples.org/aquarium/aquarium.html
2. Choose 4000 fishes
3. Check the value of fps

What is the expected behavior?
44 fps with graphics card (Intel HD Graphic 630).
60 fps with discrete GPU.

What went wrong?
33~36 fps with graphics card (Intel HD Graphic 630).
53~57 fps with discrete GPU.

Did this work before? Yes 61.0.3155.0

Chrome version: 62.0.3171.0  Channel: canary
OS Version: 10.0
Flash Version: 

My test CPUs:
Intel Skylake i7-6700K/Kabylake i7-7700/Kabylake i7-7600/Kabylake i3-7100/AMD Ryzen 7 1800x
 

Comment 1 by canx....@intel.com, Aug 2 2017

Update CPU info: Kabylake i5-7600
Labels: Needs-Triage-M62
Labels: TE-NeedsTriageFromMTV
Could someone from MTV look into this issue as we don’t have the reported configuration. Adding "TE-NeedsTriageFromMTV" label for further triage.

Comment 4 by kochi@chromium.org, Aug 7 2017

Components: Blink>WebGL Internals>GPU
Labels: Performance
canx.cao@ if possible, could you bisect which revision started with
the regression?

Comment 5 by kochi@chromium.org, Aug 7 2017

Cc: kochi@chromium.org

Comment 6 by zmo@chromium.org, Aug 7 2017

Cc: yunchao...@intel.com yang...@intel.com
Cc: jbau...@chromium.org ligim...@chromium.org
Labels: ReleaseBlock-Stable Needs-Bisect
John, do you have a win system with CPU:  Kabylake for trying a repro.
Cc: vmi...@chromium.org
vmiura@ currently has the kaby lake system.

Comment 9 by canx....@intel.com, Aug 8 2017

kochi@, I tried to find that by bisect method, cannot found the commit which causes the regression. Always get ‘bad’ result by bisect method.

BTW, I got ‘good’ result when running with 61.0.3155.0 (g 485784)by local mode, but got ‘bad’ result with bisect-builds.py mode.
cranx.cao@ thanks for checking with bisects.
I'd hoped that on integrated GPU configuration you'll see at some point
the fps drop from 44fps to 33fps or so, but did you get all the same fps?
What was your criteria for determining good or bad?
kochi@
The good and bad is the result for bisects method.
my criteria is the value of fps which different the chrome 61.0.3155.0.
According to bug description:
"What is the expected behavior?
44 fps with graphics card (Intel HD Graphic 630).
60 fps with discrete GPU.

What went wrong?
33~36 fps with graphics card (Intel HD Graphic 630).
53~57 fps with discrete GPU.
"
cranx.cao@ I'm a bit confused if I understand your statement right.

When you build Chromium yourself, or when you used binary obtained
via bisect-builds.py, you saw 33-36fps with Intel HD Graphics 630
even with 61.0.3155.0 (as you said "Always get ‘bad’ result by bisect")?

But with release build (beta channel?) 61.0.3155.0 worked fine (got 44 fps
on Intel HD Graphics 630)?

And with release build canary 62.0.3171.0 you can see definitely regressed
performance.

Am I understanding you correct?

Anyway, even with discrete GPU a performance regression can be
observable, I optimistically hope some attempts of bisect could
find the culprit.
(sorry, I am doing these replies as a part of my bug labeling sheriff
shift, I don't do this myself...)

Comment 13 by yang...@intel.com, Aug 8 2017

I just sat together with Can Cao, and did some experiments with multiple builds. Below are results (canary and dev means chromes downloaded via canary and dev channel, and bisect means Chrome downloaded via bisect-builds.py)
Version, Revision, FPS, Channel     
61.0.3155.0, 485750, 44, bisect
61.0.3155.0, 485784, 53, canary
61.0.3155.0, 485822, 44, bisect
62.0.3171.0, 490649, 48, canary 
62.0.3175.3, 3175#4, 44, dev
62.0.3179.0, 492427, 46, bisect
62.0.3179.0, 492477, 46, canary

Some observations:
1. The data are from Intel integrated GPU. But with other configurations, such as discrete GPU, the performance regression is also obvious. Here the good one is 53, while the others, 44-48 are bad ones. 
2. Canary r485784 is the latest one we know that has good FPS, while canary r490649 is the oldest one we know that has bad FPS. We didn't test revisions in between, and we don't cache them. BTW, is there a way to download every canary? We have regular tests on this benchmark, so we may assume the canaries before r485784 are also good.
3. We downloaded binaries using bisect-builds.py and tried to bisect the issues. But even with the revisions very close to the canary revisions, we always got bad FPS (44-48). 
4. We double checked latest dev channel, and still got the bad FPS.

Just a guess, the regression might not be caused by a specific code change, but the way these binaries were built. A suggestion is to bisect canary between r485784 and r490649 on any hardware configuration at your side, as we don't have canary builds in between. 

Project Member

Comment 14 by sheriffbot@chromium.org, Aug 8 2017

This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone.

All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 15 by kbr@chromium.org, Aug 8 2017

Labels: -ReleaseBlock-Stable
Sorry but I don't think I can reproduce this report in house. Here are the FPS numbers I get for the WebGL Aquarium with 4000 fish on a Razer Blade Stealth with Intel HD Graphics 620:

61.0.3163.31 (Official Build) beta (64-bit) (cohort: Beta): 27-28 FPS, with occasional jumps to 33-34 FPS
(built at 3163@{#266})

62.0.3179.0 (Official Build) canary (64-bit) (cohort: Clang-64): 29-30 FPS
(Built at r492477)

Unable to measure a significant performance difference between these channels here.

Could you try with the --enable-benchmarking command-line flag? Chrome has a number of field trials that it could be testing at any one time. They may be enabled or disabled depending on a random number specific to the profile you're using, or depending on what channel you're on.

That flag disables all field trials to hopefully make the runs across different versions more consistent.

Comment 17 by kbr@chromium.org, Aug 8 2017

Owner: kbr@chromium.org
Thanks for the suggestion. It does look like there's a performance difference with that flag:

Beta: 36-37 FPS
Canary: 32-33 FPS

I'm trying to get the per-revision bisect script set up on this machine. Will follow up.

Also, about:version contains a list of variation IDs, which people at Google can map to info about which experiments are being used. That may help if we think a field trial would be causing the issue.

Comment 19 by kbr@chromium.org, Aug 9 2017

Components: -Internals>GPU Internals>GPU>ANGLE
Labels: -Needs-Bisect
Owner: geoffl...@chromium.org
This problem was difficult to bisect and I'm not fully confident in the result. At first I looked at the initial performance numbers for 4000 fish and tried to discriminate good and bad builds based on that. That wasn't a good metric. All of the builds started off running 4000 fish quite slowly -- around 16 FPS -- but after several seconds they reached a certain peak performance number. I unsuccessfully probed back to r483000 in the build archive and eventually ran the bisect with Yang's revisions:

python bisect_builds.py --use-local-cache -o -a win64 -g 485784 -b 490649 -- --no-first-run --enable-benchmarking https://webglsamples.org/aquarium/aquarium.html

Good builds reached about 32-33 FPS on this machine. Bad builds reached 30 FPS maximum, and usually less.

With this discriminator the result was:

You are probably looking for a change made after 490155 (known good), but no later than 490156 (first known bad).
CHANGELOG URL:
The script might not always return single CL as suspect as some perf builds might get missing due to failure.
  https://chromium.googlesource.com/chromium/src/+log/c4dd365c1e19b80c5e4ec45a636e9601364848a2..5d1a1066a1eb50b250204e3f86d582502af0e9c1

which is this ANGLE roll:

https://chromium.googlesource.com/angle/angle.git/+log/0d2ecb4..40ac783

Geoff, can you see whether you can reproduce the slowdown at this Chromium revision on an Intel GPU? If so, could you please bisect into that ANGLE roll? There are quite a few potential culprits based on CL descriptions. Thanks.

Comment 20 by yang...@intel.com, Aug 9 2017

Ken, thanks for following up with this issue!
But as I mentioned in #13, we couldn't use bisect_builds.py to get the culprit as we always got bad results. The actual issue is we saw good performance ever with Chrome Canary 61.0.3155.0, but we never saw it again either with later Canaries (We didn't test other canaries until 62.0.3171.0 (r490649)) or any revisions downloaded using bisect_builds.py, even with similar revisions around that good Canary 61.0.3155.0 r485784 (r485750 and r485822). 
As the test method, we did wait for a few seconds until we could get a relatively stable FPS. To me, the FPS value is stable enough (flux is within 3 FPS), and performance regression is larger than this flux. Note that this problem doesn't only limit to Intel GPU. On another desktop with NVIDIA GTX 1050 Ti, I saw 34FPS vs. 27FPS between the good canary and latest canary after I changed the finish number to 9000 (I got 60FPS with 4000 fishes, so I changed the application). 
I still suggest you to check with your internal canaries directly, rather than using bisect_builds.py. You may need to check Canary 61.0.3155.0 first. Sorry that due to attachment limitation, I couldn't attach it here (I tried to split it into 5 parts to address 10MB limitation, but failed to upload). For your convenience, I put a copy at https://drive.google.com/open?id=0B8k7zWC0icj7QTlDcVk2V2pmVEU. 
BTW, I ever looked into chrome://GPU to see any difference among good canary and other builds, but in vain. 

A related question: Is there a way to download Chrome Canary offline installer directly, or even download a range of it, like your internal builds that can be download via bisect_builds.py? If we can do this, we can spend more efforts to nail down some issues. The current situation is:
1. Chrome has 4 release channels, canary, dev, beta and stable. With the latter three, I can attach the URL with standalone=1 to download the offline binary. But for canary, this doesn't work.
2. Now we backup the canary offline installer from %AppData%\Local\Google\Update\Download. But a problem with this is if we directly install it, it couldn't coexist with other channel (Even the desktop icon is not a canary icon, but the version is correct). I guess the online installer of canary must do some tricky things other than just simply call the installer. 
3. The builds that can be downloaded via bisect_builds.py have the pattern 'http://commondatastorage.googleapis.com/chromium-browser-snapshots/%s%s/%s/chrome-%s.zip --show-progress -O %s' % (target_os_1, target_arch, rev_str, target_os_2). With this, we can easily download any revision. We hope to do so with all these 4 channels. 
4. Another limitation of bisect_builds.py is it doesn't support Android. If you may also provide chrome_public builds, it would be more convenient. 
I know these requirements need many extra efforts at your side. But if you can expose these, it would be more convenient for developers outside of Google, and you can leverage more resources externally to investigate some issues. If you think some of these are reasonable requests, I'd like to create separate issues to track them. 
Is it possible that a Finch trial ("variations" in chrome://version) is affecting the results? There may be a flag to turn off all Finch trials but I don't know what it is. --reset-variation-state can be used to re-randomize the variations. Not sure if they can be individually controlled without knowing what the un-hashed names are for each feature.

We can look up variations hashes for you if this turns out to be a culprit.

Comment 22 by kbr@chromium.org, Aug 10 2017

Yang: sorry I didn't clarify this, but I ran this bisect against our internal per-revision perf build archive (the "-o" flag, which isn't available in the public script). These should be close in behavior to the Canary channel. It's the best tool we have available. I'm sorry, but I can't afford the time to redo this bisect more manually. It was already quite difficult to reproduce on our hardware.

jbauman@ pointed out above that the --enable-benchmarking command line argument ignores all Finch experiments. Note that I specified this when running the per-revision bisect, above.

It's rare that a performance problem (one which isn't affected by Finch experiments) shows up only in the official builds and not the continuous build archive.

There deliberately isn't a way to download earlier official builds from any of the channels. Consequently, I'm sorry, but I can't help with requests (1), (2) and (3).

Request (4) is a good one and I'm surprised this hasn't been implemented yet. Could you please file a bug about it and I'll raise the priority? Thanks.

In the meantime, please investigate the ANGLE roll I pointed to as the potential cause of the performance regression. Please profile things before and after that roll and see if you see anything different in the GPU process.

Comment 23 by yang...@intel.com, Aug 10 2017

kbr@, I filed request (4) as crbug.com/754132 and cc'ed you. Thank you for the help!

I just had a try with ANGLE roll you mentioned. I downloaded r490155 and r490175 (This is the minimum revision greater than r490155 I can download from your server) and tested them with Aquarium. But I don't see obvious perf change according to the results:
fish_number fps_r490155 fps_r490175
6000        37          37
7000        31          31
8000        28          27
9000        25          24

I ever used your bisect tool to bisect many regressions (10+), and this is the very first time this script doesn't work. So I'm very curious on the root cause. I did more experiments today and had a new finding. The latest Beta Channel (61.0.3163.39, branched from r488528) also has good performance. Below are my collected data:
fish_number fps_beta_61.0.3163.39(r488528) fps_revision_r488538 fps_canary_62.0.3181.0(r493197) fps_revision_r493200
6000        49                            38                  39                             39   
7000        42                            33                  34                             34
8000        37                            29                  30                             30
9000        33                            26                  27                             28

Observations:
1. Latest Beta channel can also repro the good performance, while the closest revision (r488538) to latest Beta (based on r488528) still has bad result. The perf diff is big (49 vs. 38 for 6000 fishes).
2. Latest canary doesn't have good performance. 
3. I tried "--enable-benchmarking" with good revision, but it had no effect. Should I still see many hashes in variations of chrome://version with this option?
4. I know "--use-passthrough-cmd-decoder" will improve the perf of this case a lot given the fish number is very large. But it's not the reason of this issue, as it still improved the FPS a lot if I used this option. 

I know to track down this kind of issue consumes a lot of time. May I have a last request: Could you please try with latest beta channel (61.0.3163.39) and latest canary channel (62.0.3181.0) separately on your machine with Intel GPU and 4000 fishes? If the FPS diff is not obvious at your side, I'm fine to close this issue. 

PS. A bit of background for this issue: I'm leading an effort at Intel to track performance of some web benchmarks among different hardware platforms and browsers. For Chrome, we use latest Canary Chrome and conduct bi-weekly testing. This issue was caught by a recent routine testing. 

Comment 24 Deleted

Comment 25 by kbr@chromium.org, Aug 11 2017

Cc: h...@chromium.org thakis@chromium.org asvitk...@chromium.org geoffl...@chromium.org isherman@chromium.org
Owner: kbr@chromium.org
61.0.3163.39 (Official Build) beta (64-bit) (cohort: Beta)
4000 fish: 32-36 FPS (varies a lot)

62.0.3181.0 (Official Build) canary (64-bit) (cohort: Clang-64)
4000 fish: 30-31 FPS

The FPS is highly variable with beta. Perhaps there is a measurable performance regression, though not as large as the one reported above. Here I am seeing 91% of Beta's performance with Canary, while the report indicates it's as low as 78%. But a LOT has changed between beta and Canary, including switching from MSVC to Clang for building the browser.

thakis, hans: have the Chromium continuous builds switched to being built with Clang, too? Or is it just the official builds which have switched?

I don't know the full behavior of --enable-benchmarking. isherman@, asvitkine@: is it expected that the variations hashes would still show up in about:version if that flag's passed in?

Comment 26 by kbr@chromium.org, Aug 11 2017

BTW, --enable-benchmarking doesn't affect the performance of the Canary installation on this machine.

Everything not explicitly choosing msvc now uses clang. Most bots don't explicitly choose msvc. Perf waterfall bots now use clang.

You can look at the bottom of about:version to check if a binary was built by clang, msvc, or msvc pgo. It's possible that pgo helps a lot for this test for some reason. We only use PTO on a special builder and these builds are used for nothing except for releases. They go into a different build archive than regular builds (and clang builds currently go to a third archive). We switch between these archives via an a/b test mechanism. It's possible there's no regression in any of the archives, but switching between them is what causes the difference. Checking about:version could confirm this theory.

Comment 28 by yang...@intel.com, Aug 11 2017

MSVC vs. Clang is not the reason, as if Clang is worse, it couldn't explain why we always see bad performance for continuous builds, which was built with MSVC. BTW, I double checked continuous builds have been switched to Clang.
PGO may be not the main reason. 61.0.3155.0 and 62.0.3171.0 are two canaries both built with MSVC PGO, but I still see big perf diff between them (It seemed to me with PGO, we have a bit better performance). For continuous builds, PGO is not used. And I think with Clang build, PGO is not enabled?

As this issue couldn't be reproduced well on your machines, and it's not really a blocking issue at my side. I'm fine to close it. Really appreciate your time on this, kbr@ and others. Thanks a lot!

We will continue to have regular testing on this case, and report back if we have new finding. 
Closing sounds fine to me.

My theory was that "not pgo" could be the reason, since continuous builds don't use pgo but canary builds do. But if you see the problem on a pgo build, then this can't be true, as you say.
yang.gu: If you're able to consistently repro and are certain it's something between 61.0.3155.0 and 62.0.3171.0, I suggest using the bisect-builds.py to find a bisection. It's pretty easy to use and only takes a few minutes and has a good chance of finding the CL range that regressed this if it was indeed caused by a CL:

https://www.chromium.org/developers/bisect-builds-py
They tried that, see above

Comment 32 by kbr@chromium.org, Aug 11 2017

Status: WontFix (was: Unconfirmed)
OK. Sorry Yang that we couldn't definitively get to the bottom of this issue. Please continue to work with us on the continuous builds and profile things; if something obvious is at the top of the profile, then eliminating it might bring the continuous builds' performance closer to the PGO builds.

Comment 33 by yang...@intel.com, Aug 13 2017

Ken, I already did some detailed investigation on this case a while ago, and here are some findings:
1. We compare this case between Chrome and Edge. To get comparable results (not too close to 60, and not too low), I change the benchmark a bit and use 9000 fishes instead. With this configuration, Chrome has 31 FPS, while Edge has 42 FPS on a powerful desktop. 
2. CPU looks like the actual bottleneck, while GPU is relatively idle. 
3. We can get 60 FPS under Linux, and if we use OpenGL driver directly on Windows (--use-gl=desktop), we can also get 60 FPS. So I think ANGLE has a big impact on its performance, and we may need to tune ANGLE further. We have some initial ideas on this, and want to bring Chrome-like tracing into ANGLE first to easy the investigation. This is a TODO in our list now.
4. There are 3 validations for this case, in command buffer client, command buffer service and ANGLE. 
The first validation is lightweight, and if we disable it, we can get minor perf gain (< 1FPS). 
Chromium is migrating the second validation to ANGLE. After the migration is done, the second validation can be totally removed. Option “--use-passthrough-cmd-decoder” can be used now to bypass the second validation. If we use this option, FPS can be improved from 31 to 40. 
The third validation in ANGLE comes with some optimization. The optimization is to decide the indexRange so that not all data will be converted to D3D compatible type before passing to D3D driver. However, this optimization is negative for this case as we need all the data to be converted, so there is no need to calculate indexRange again and again. If we disable the validation and optimization, the FPS can be improved from 31 to 37.
If we disable all the validations above, FPS is 44, which is similar to Edge. 
The work of second validation is almost done now. And we're working on some optimization in third validation in ANGLE (https://chromium-review.googlesource.com/c/607413).

One final note is our change to this case might make it uncommon (thinking of 9000 DrawElements per frame), so I'm not sure if Chrome should pay much attention to this. Anyway, we'll try to optimize some common parts so that general cases can be also beneficial. 

Anyway, if in the future, you happen to find some magic for your official builds (Canary, Dev, Beta and Stable) that may bring a big performance impact on this case, please let us know here. Thanks!

Comment 34 by kbr@chromium.org, Aug 15 2017

Thanks Yang. Those are great investigations. We definitely want to improve Chrome's performance on high-end workloads that make lots of draw calls. There are other customers requesting this case be improved. As you know, the short-term goal is to switch to the pass-through command buffer on Windows, so the double-validation imposed by the command buffer will be removed soon. We appreciate your help investigating the other performance overhead imposed by index validation and look forward to turning it off on platforms supporting KHR_robust_buffer_access_behavior (including implicitly all D3D based backends).

Comment 35 by yang...@intel.com, Aug 15 2017

Do you want me to create another issue to track the performance gap between Chrome and Edge on this case (I searched and could not find an existed one)?
Anyway, we will continue to improve the performance of this case. 

Comment 36 by kbr@chromium.org, Aug 15 2017

Yes, that would be great, thanks.

Sign in to add a comment