Issue metadata
Sign in to add a comment
|
elm: 16% webgl aquarium regression due to .gyp -> .gn (R53-stable and R54-beta) |
||||||||||||||||||||||
Issue descriptionForked from http://crosbug.com/p/58752 There has been a clear regression in the 1000 fish graphics_WebGLAquarium between R53 stable and R54 beta. See the attached graph. From the graph, there appears to have been two regressions. The first appears to have occurred between: 8561.0.0 54.0.2787.0 52.791000 2017/07/08 8562.0.0 54.0.2790.0 47.277935 -12% 2017/07/08 The Chrome OS Chrome switched over from .gn to .gyp between 54.0.2787.0 & 54.0.2790.0. Manually building 54.0.2787.0 & 54.0.2790.0 w/ simplechrome & .gn (after fixing build issues) shows the same regression. Is there a way to still build simplechrome w/ .gyp so we can do a side-by-side comparison?
,
Oct 24 2016
54 should still support gyp so it should be possible to use a version of third_party/chromite using gyp. That said, I'm not sure how you would use that. We were pretty careful about checking the compile flags during the migration, but it is possible that something got missed. Is this affecting any other benchmarks? Honestly, I'm not even sure where to begin investigating this. Switching back to GYP is really not an option. llozano@, any other thoughts?
,
Oct 24 2016
we can run "perf" on the benchmark for an image before and after. We can also re-compare the compile flags.. This week I will be on vacation half of the week. Can this wait until next week?
,
Oct 24 2016
@1 - I didn't do anything to enable .afdo this time. I'll try again with the instructions from: https://bugs.chromium.org/p/chromium/issues/detail?id=629593#c19 @2 - I don't remember how to build with .gyp and the instructions have been removed from: http://www.chromium.org/chromium-os/how-tos-and-troubleshooting/building-chromium-browser What commands do I issue to build with .gyp?
,
Oct 24 2016
To build with GYP in 54 for SimpleChrome, the following *should* work (Note: This won't work in 55 since we removed the gyp files): 1. Enter SimpleChrome 2. unset GYP_CHROMIUM_NO_ACTION 3. gclient runhooks You may need to wipe your out/Release directory (or move it to something like out_gn/Release) before step 3.
,
Oct 24 2016
Building with SimpleChrome & .gyp (no .afdo?): 54.0.2787.0 = 52.9243273293 54.0.2790.0 = 52.9556647624
,
Oct 24 2016
simplechrome with GYP and with GN does not enable AFDO unless you do what you suggested in #4. It is also possible that AFDO does not make a difference for this benchmark.
,
Oct 25 2016
,
Oct 25 2016
I will be on vacation and the rest of my team is quite busy this week dealing with LLVM migration issues. I have a new engineer trying to help with this with help form more senior engieers. So, please be patient.
,
Oct 25 2016
BTW, is this only happening on elm, or does it appear on other boards? What is the architecture of elm? +ihf@
,
Oct 26 2016
elm is ARM64.. but user space is compiled as ARM32. It is a little bit special and the compiler flags used for it are a little bit different.
,
Oct 26 2016
So, there are multiple regressions. And there are horrible regressions on Intel which deserve a new issue. But I don't think other boards show much of an impact at build 8561.0.0 (much later).
,
Oct 26 2016
Just for the books, the worse regression also visible on many Intel devices happened https://crosland.corp.google.com/log/8777.0.0..8778.0.0 and was obviously integrated to M54. But I bet that was in Chrome. https://chromium.googlesource.com/chromium/src/+log/55.0.2846.0..55.0.2852.0?pretty=fuller&n=10000 Unfortunately we are missing M54 coverage for a month after branching, otherwise the change might be easier to locate. https://cros-goldeneye.corp.google.com/chromeos/console/listCrosbolt?graphSKU=tidus_intel_broadwell_celeron_3205U_2Gb&graphTest=graphics_WebGLAquarium%2Favg_fps_1000_fishes.avg_fps_1000_fishes
,
Oct 26 2016
So I should state again that my previous comment is mostly for Intel devices and has nothing to do with the 2 elm regressions. I filed issue 659438 for it.
,
Oct 26 2016
I did a spot check for different flags set by .gyp versus .gn build. Only .gyp has these flags enabled: -DUSE_PANGO=1 -DUSE_CAIRO=1 -DENABLE_HANGOUT_SERVICES_EXTENSION=1 -Wno-extra -g Only .gn has these flags enabled: -Wno-psabi -g2 -D_LARGEFILE64_SOURCE .gyp only uses this flag in ffmpeg & video codecs: -D_LARGEFILE_SOURCE
,
Oct 26 2016
I enabled DUSE_PANGO=1 & -DUSE_CAIRO=1, and disabled -D_LARGEFILE64_SOURCE for the .gn build, and it doesn't make a difference. At this point, I think we need help from the compiler team to use their bisect script.
,
Oct 26 2016
Attached are webgl aquarium 1000 fish traces of 54.0.2790.0 compiled with .gyp & .gn.
,
Oct 28 2016
I've also uploaded to patches that allow building 54.0.2790.0 w/ both .gyp & .gn to allow head-to-head comparisons: https://chromium-review.googlesource.com/403695 fix gyp swiftshader https://chromium-review.googlesource.com/403696 Disable swiftshader for .gn build Attached are traces for the .gyp & .gn builds taken with just 1 fish, and the CPU & GPU freqs forced to minimum. This should reduce any DVFS related sample noise, and accentuate any slow traces. # Set freq to min echo 253500000 > /sys/class/devfreq/13000000.mfgsys-gpu/min_freq echo 253500000 > /sys/class/devfreq/13000000.mfgsys-gpu/max_freq echo 507000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq echo 507000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq echo 507000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq echo 507000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq # Restore interactive freq range echo 253500000 > /sys/class/devfreq/13000000.mfgsys-gpu/min_freq echo 598000000 > /sys/class/devfreq/13000000.mfgsys-gpu/max_freq echo 507000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq echo 1703000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq echo 507000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq echo 2106000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq test_that --fast ${IP} graphics_WebGLAquarium Results: gyp interactive ------------------------------------------------------------------------------------------------- avg_fps_0050_fishes 58.1829081019 avg_fps_1000_fishes 50.224662878 avg_interframe_time_0050_fishes 0.0171871780326 avg_interframe_time_1000_fishes 0.0199092714992 avg_render_time_0050_fishes 0.00814816826268 avg_render_time_1000_fishes 0.0120694943408 meminfo_MemUsed 1360468 meminfo_SwapUsed 0 std_interframe_time_0050_fishes 0.0289895137422 std_interframe_time_1000_fishes 0.0328220678949 std_render_time_0050_fishes 0.00214724899737 std_render_time_1000_fishes 0.00247157428033 gyp slow ------------------------------------------------------------------------------------------------- avg_fps_0050_fishes 34.0780636232 avg_fps_1000_fishes 20.2655484877 avg_interframe_time_0050_fishes 0.0293479532759 avg_interframe_time_1000_fishes 0.0493393444624 avg_render_time_0050_fishes 0.0147758290084 avg_render_time_1000_fishes 0.0436513974116 meminfo_MemUsed 1360032 meminfo_SwapUsed 0 std_interframe_time_0050_fishes 0.117042959368 std_interframe_time_1000_fishes 0.148837118996 std_render_time_0050_fishes 0.00542782962008 std_render_time_1000_fishes 0.00882979154474 gn interactive ------------------------------------------------------------------------------------------------- avg_fps_0050_fishes 57.3966861547 avg_fps_1000_fishes 48.6229061315 avg_interframe_time_0050_fishes 0.0174226086382 avg_interframe_time_1000_fishes 0.0205667351795 avg_render_time_0050_fishes 0.00898145620958 avg_render_time_1000_fishes 0.0126094364761 meminfo_MemUsed 1428976 meminfo_SwapUsed 0 std_interframe_time_0050_fishes 0.0291992164523 std_interframe_time_1000_fishes 0.0365553556365 std_render_time_0050_fishes 0.00271853990102 std_render_time_1000_fishes 0.00267530421193 gn slow ------------------------------------------------------------------------------------------------- avg_fps_0050_fishes 32.22159417 avg_fps_1000_fishes 17.9194787083 avg_interframe_time_0050_fishes 0.0310412370053 avg_interframe_time_1000_fishes 0.0558055555379 avg_render_time_0050_fishes 0.0166992794964 avg_render_time_1000_fishes 0.0501053588888 meminfo_MemUsed 1456356 meminfo_SwapUsed 0 std_interframe_time_0050_fishes 0.134101474014 std_interframe_time_1000_fishes 0.17443441754 std_render_time_0050_fishes 0.00578359435842 std_render_time_1000_fishes 0.00836116544578 Note, the autotest serves a copy of webgl aquarium locally. Once you run graphics_WebGLAquarium, you can thereafter manually start a python webserver: cd /usr/local/autotest/tests/graphics_WebGLAquarium/ python -m SimpleHTTPServer 58544 & And browse to local the aquarium page: http://localhost:58554/aquarium.html
,
Oct 28 2016
djkurtz@, Removing the flag GPU_IMPLEMENTATION from gpu/BUILD.gn resulted in improvement on my machine. However, the numbers vary from run to run so I need you to verify on your end.
,
Oct 29 2016
No, I don't think "GPU_IMPLEMENTATION" makes a noticeable difference. .gn = 45.8767520962 .gyp = 50.6645349801
,
Oct 31 2016
When marcheu looked at traces in #18, he thinks v8 is a bit slower in .gn than .gyp. Attached are screenshots from JetStream (JavaScript benchmark) for .gn & .gyp builds. To minimize variation, I ran these tests with the big cores disabled, and little cores pinned to max OPP: # Set littles to Max freq, GPU/bigs to min, disable bigs echo 253500000 > /sys/class/devfreq/13000000.mfgsys-gpu/min_freq echo 253500000 > /sys/class/devfreq/13000000.mfgsys-gpu/max_freq echo 1703000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq echo 1703000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq echo 507000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq echo 507000 > /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq echo 0 > /sys/devices/system/cpu/cpu2/online echo 0 > /sys/devices/system/cpu/cpu3/online Results from 2 runs of 3 iterations: .gyp 24.678 +/- 0.23634 24.711 +/- 0.711 .gn 24.137 +/- 0.69827 24.023 +/- 0.85039 The results do show that the .gn build is a tiny bit slower than .gyp. But not by much (~2.5%). The final benchmark score is a geometric Mean of lots of subtests. The subtests are grouped into two categories, "Latency" and "Throughput". "Latency" is ~5% better for .gn, whereas the "Throughput" is essentially the same (< 1% delta). There is one outlier - the "splay-latency" score is about 20% higher on .gn. Does anyone know enough about these tests and/or V8 to take a look at these results and get a clue as to why the 'latency' subtest scores are better with .gn than .gyp? I did a quick check of the build flags using the attached "parse_build.py" script, and I didn't see much difference: ../../v8/src/disassembler.cc Only in gyp: set(['-DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE', '-DUSE_LIBJPEG_TURBO=1', '-DUSE_LIBPCI=1', '-DCR_CLANG_REVISION=274369-1', '-DUSE_CRAS=1', '-DENABLE_HANGOUT_SERVICES_EXTENSION=1']) Only in gn: set(['-DOFFICIAL_BUILD'])
,
Oct 31 2016
,
Nov 3 2016
Manoj is trying to get "perf profiles" to see if we can get a clue from there. We are still looking at the differences in the command line options. Will be in a conference for the next couple of days.
,
Nov 8 2016
I have got the perf profiles for the two different builds that use gyp and gn. Trying to see if I can spot something.
,
Nov 8 2016
,
Nov 9 2016
manoj had suggested trying use_libpci=true in gpu/config/BUILD.gn . Unfortunately, that had no noticeable effect: avg_fps_1000_fishes @ lowest OPP: use_libpci=<default> 17.224753127 use_libpci=true 17.9218621253 .gyp 20.2655484877
,
Nov 11 2016
From the perf profiles, skia::BGRAConvolve2D takes longer time on gn build as compared to gn. Have to check how the generated code for this function looks like on the two builds.
,
Nov 11 2016
The generated code for skia::BGRAConvolve2D is same for both GYP and GN builds except that GYP build also has unwind tables. Having unwind tables should not improve GYP build performance but I need to cross check.
,
Nov 12 2016
Even if specified as flags, GN build explicitly disables pango+cairo and uses ozone only.
Gyp however uses pango, cairo.
In build/config/ui.gni:
if (is_linux && !use_ozone) {
use_cairo = true
use_pango = true
} else {
use_cairo = false
use_pango = false
}
Gyop version chrome has following libraries as dependencsies that do not turn up in gn build.
libpango-1.0.so.0
libpangocairo-1.0.so.0
libpangoft2-1.0.so.0
Trying to explicitly set use_ozone=false in build/config/ui.gni did not make any difference. Somehow use_ozone still got set to true.
,
Nov 14 2016
Daniel, Any idea about disabling ozone and enabling pango+cairo. Changing use_ozone to false in build/config/ui.gni is not working. Changed line regarding ozone (Line 27) in build/config/ui.gni to: use_ozone = false I added a few asserts at line 85 in build/config/ui.gni assert(is_linux) assert(use_ozone) // is not hit, use_ozone is somehow set to true assert(use_pango) // is hit, use_pango = false assert(use_cairo)
,
Nov 14 2016
All the Chrome OS builds use ozone since we transitioned to freon 2 years ago. So I don't think you will be able to disable it (and I don't think it's part of the problem, since it's been like this forever).
,
Nov 14 2016
We are not trying to disable ozone but enable pango and cairo which are enabled on GYP. Do you think Disabling Cairo or Pango could be the reason for performance drop?
,
Nov 14 2016
so, it seems that ozone does not imply pango and cairo in GN but it it did for GYP. Is this a bug in GN or is this intended behavior?
,
Nov 14 2016
,
Nov 14 2016
@33 yes that sounds like a bug. Actually doing so will change the font rendering engine. I would expect derat@ would have noticed something like that :)
,
Nov 14 2016
I believe it is intentional that at least some ozone platforms (e.g., cast) do not depend on either pango or cairo. From what I can at a short glance, it does look like pango and cairo are off by default for CrOS + Ozone in GN, but were on by default for GYP: https://chromium.googlesource.com/chromium/src/+/branch-heads/2840/build/common.gypi#774 It should be easy enough to fix this if need be.
,
Nov 14 2016
We don't use Pango to draw text. I think that the code for that has been deleted; see issue 457307 . I don't see any references to the USE_PANGO #define in Chrome apart from the line in build/config/BUILD.gn that's setting it. There are some mentions of USE_CAIRO in Skia, though, along with this in ui/gfx/canvas.cc: #if !defined(USE_CAIRO) // skia::PlatformCanvas instances are initialized to 0 by Cairo, but // uninitialized on other platforms. if (!is_opaque) canvas_->clear(SkColorSetARGB(0, 0, 0, 0)); #endif Looks like we skip clearing new non-opaque canvases when USE_CAIRO is set. The comment suggest that this is supposed to happen, though.
,
Nov 15 2016
@#27: How did you generate the "perf profiles"? Can you share the results? AFAICT from tracing, BGRAConvolve2D() is only called from ImageOperations::Resize(), which is not called during webgl aquarium, so I doubt that is a factor for this particular issue. @#29: From my previous experiments, forcing cairo & pango to true in ui.gni builds .gn w/ USE_PANGO=1 USE_CAIRO=1, however, it does not change performance of the .gn build to match .gyp.
,
Nov 15 2016
Enabling Pango/Cairo did not fix the gn performance issue. It did pull in the pango and cairo library dependencies like GYP build. Attaching the perf reports for the R54-8560 (GYP) and R54-8562(GN) builds. If you want to generate the reports for yourself - The raw perf data for the two images is here: https://drive.google.com/a/google.com/file/d/0B6pPpjh2SnF_QVJXdjJsWFJueU0/view?usp=sharing The debug symbols (debug.tgz) for both images need to be downloaded from : https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/elm-release/R54-8560.0.0?pli=1 and https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/elm-release/R54-8562.0.0?pli=1 After extracting the debug symbols, the symbol files have debug in their names so a soft link must be made for all files so that perf can find the names. perf report --symfs debug -w 8,16,72,72 --fields sample,comm,symbol
,
Nov 15 2016
After enabling pango/cairo on GN build, Ldd shows major difference to be libgmp library dependency. GYP uses these extra libraries: libgmp (math library - might be important so I am going to check on this) libavahi-client - DNS libahavi-common libcups - Printing libgnutls - Communication/Encryption libhogweed - Communication/encryption libnettle - Communication/encryption
,
Nov 15 2016
Checking the final link command yielded these more differences: GN explicitly links to these libraries in linking chrome while GYP does not. lEGL lGLESv2
,
Nov 16 2016
Manually linking chrome and nacl_helper without lEGL and lGLESv2 worked but did not change performance.
,
Nov 16 2016
We are running out of options here. We have: - compared compiler command lines. - compared linked libraries - compared link lines - looked at perf reports and we have not found the cause of the regression. Daniel proposed a nice idea where we could bisect by mixing and matching object files. However, this does not work because the locations and names of object files between GYP and GN have changed. I think we need help from someone with domain knowledge of this benchmark. The problem may be in setting of GPU or communication with GPU... Can someone from graphics team please help with this?
,
Nov 16 2016
Checking v8, GYP defines arm_fpu="crypto-neon-fp-armv8" But GN sets arm_fpu="neon". Later on in v8/BUILD.gn, GN adds more defines: "CAN_USE_VFP3_INSTRUCTIONS", "CAN_USE_VFP32DREGS", "CAN_USE_NEON", These are not set in GYP when building v8.
,
Nov 16 2016
@43: I have looked at it; V8 is slower. The graphics team isn't responsible for v8 perf...
,
Nov 16 2016
@44, Removing the extra flags set for GN build did not improve webGl performance.
,
Nov 16 2016
ok, so we need someone from v8 to look at it. Who can help with this?
,
Nov 16 2016
jochen@ / danno@ - can one of you find someone to help us track down what appears to be a V8 perf regression probably related to the GN switchover?
,
Nov 17 2016
,
Nov 18 2016
,
Nov 18 2016
Prime suspect would be the missing -O3 for V8, which was fixed already in M53: https://chromium.googlesource.com/v8/v8/+/365e32b1302 Landed in Chromium in: https://chromium.googlesource.com/chromium/src/+/953cd4fa7eb The change was shortly reverted for an experiment in this range: https://chromium.googlesource.com/v8/v8/+log/bbb61d8aea7324..ce5265016bfb This was at some point in M53..M54 but didn't make it into Chromium at all according to the logs.
,
Nov 18 2016
Secondary suspect would be the wrong arm_float_abi="softfp" and arm_use_neon=false in snapshots. This was fixed just recently in: https://chromium.googlesource.com/chromium/src/+/2c53ba1d955a7da25d Could you try applying this fix and measure? If this has an impact we should backmerge the commit.
,
Nov 18 2016
In the build logs, v8 is compiled with O3 for both gyp and gn. No softfp in the build logs for v8 either.
,
Nov 18 2016
Tried this with latest chrome on elm and there is no improvement.
,
Nov 21 2016
There are several files that are compiled for GN but not for GYP build. I believe this could one reason for gpu performance since maybe another library implementation is sued for gpu interaction. services/ui/common/gpu_memory_buffer_impl.cc services/ui/common/gpu_service.cc services/ui/common/gpu_type_converters.cc services/ui/common/mojo_buffer_backing.cc services/ui/common/mojo_gpu_memory_buffer.cc services/ui/common/mojo_gpu_memory_buffer_manager.cc services/ui/common/switches.cc services/ui/gles2/command_buffer_driver.cc services/ui/gles2/command_buffer_driver_manager.cc services/ui/gles2/command_buffer_impl.cc services/ui/gles2/command_buffer_local.cc services/ui/gles2/command_buffer_task_runner.cc services/ui/gles2/gl_surface_adapter.cc services/ui/gles2/gpu_impl.cc services/ui/gles2/gpu_memory_tracker.cc services/ui/gles2/gpu_state.cc services/ui/gles2/ozone_gpu_memory_buffer.cc services/ui/gles2/raster_thread_helper.cc services/ui/gpu/display_compositor/compositor_frame_sink_factory_impl.cc services/ui/gpu/display_compositor/compositor_frame_sink_impl.cc services/ui/gpu/display_compositor/display_compositor_impl.cc services/ui/gpu/display_compositor/display_impl.cc services/ui/gpu/gpu_service_impl.cc services/ui/gpu/gpu_service_mus.cc services/ui/gpu/mus_gpu_memory_buffer_manager.cc services/ui/input_devices/input_device_server.cc
,
Nov 21 2016
The obvious things that changed during the V8 gn switch don't affect the used revision. Making the bug available again for comment 55 as this isn't v8 related. I'd still make a few more sanity checks of the build and link flags, especially for both toolchains (i.e. snapshot and target). Could somebody provide simple build instructions? I have never built SimpleChrome before. My context is: I have a Chromium checkout. I'd only need to generate the ninja files and not actually compile. What I did for gn is the following (with gyp I'm a bit clueless): git co 54.0.2790.0 gclient sync cros chrome-sdk '--board=daisy' --nocolor --use-external-config --clear-sdk-cache -- python -u tools/mb/mb.py gen -m chromium.chromiumos -b 'ChromiumOS daisy Compile' --config-file tools/mb/mb_config.pyl //out_daisy/Release
,
Nov 21 2016
To rule out any problems with V8's snapshot toolchain I have the following suggestion: Could you test the gn and gyp versions with the v8 snapshot files of the respective other? I'm not sure if external snapshots were shipped for cros in this milestone. If they were, there should be a natives_blob.bin and a snapshot_blob.bin file in the build output directory side-by-side with the executable. You could temporarily move those two files from gyp to gn folder and vice-versa before running the tests. Unless the gyp and gn versions of the snapshot files are equal, then this would be pointless of course.
,
Nov 21 2016
I could not find a single version that could build for both gyp & gn. The range around the .gyp->.gn transition had some issues with "swiftshader", which I fixed in the patches in #18. Here are the steps I've been using: cros chrome-sdk --board=elm --chroot=~/chromeos/chroot/ --internal --fastbuild git fetch https://chromium.googlesource.com/chromium/src refs/changes/96/403696/1 && git checkout FETCH_HEAD gclient sync # build .gn rm -rf out_elm_gn/ gn gen out_elm_gn/Release --args="$GN_ARGS" ninja -C out_elm_gn/Release -j 5000 -l 50 -v -n chrome chrome_sandbox nacl_helper > /tmp/b_gn # build .gyp rm -rf out_elm/ gclient runhooks ninja -C out_elm/Release -j 5000 -l 50 -v -n chrome chrome_sandbox nacl_helper > /tmp/b_gyp And then I compare .gn & .gyp flags with the parse-build.py script in #21: python ./parse_build.py build_gyp build_gn > diff_gyp_gn
,
Nov 21 2016
.gn w/ .gyp (natives_blob.bin, snapshot_blob.bin) shows the same performance regression.
,
Dec 19 2016
There is an owner on this bug, but the status was not "Assigned" or "Started". Fixing. If you do not own this bug, please remove yourself as the owner and make the status "Available".
,
May 12 2017
djkurtz@ The performance difference seems to be fixed on latest builds. There are 2 performance bumps in Dec and Feb for Elm board in our performance tracking dashboard. https://dashboards.corp.google.com/google::_63a30d8c_d5a2_4d77_a886_648cbea75702?f=date:bt:1441954800000000,1494486000000000
,
May 15 2017
Too much has changed since then and it is probably extremely difficult to do side-by-side .gyp vs. .gn testing now. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by lloz...@google.com
, Oct 22 2016