Issue metadata
Sign in to add a comment
|
speedometer2 regressed 2-4% on llvm-next |
||||||||||||||||||||||
Issue descriptionManoj and I checked the dashboard and logs and it seemed to start happening since last llvm-next roll.
,
Jul 18
Is speedometer the only benchmark that has regressed?
,
Jul 18
speedometer/speedometer2 are definitely regressed. I have seen some degradation in page cyclers but the results can be flaky.
,
Jul 18
Notes from offline-discussion: - We need to try to figure this out before doing the compiler upgrade (which we want to do soon after the branch point). - Ting-Yuan is going to test to see if ToT LLVM fixes the problem. - If ToT does not fix the problem, Caroline will investigate (bisection on LLVM) as soon as she finishes the high-priority CFI work (hopefully by Friday or Monday). Assigning this to Ting-Yuan, until he finishes the ToT check
,
Jul 20
Just for the record, "llvm-next" is LLVM r333878.
,
Jul 23
I am almost done verifying correctness testing. I believe correctness testing will not show any issues. How is it going with the analysis of the performance issue? Thanks
,
Jul 23
There were a couple of packages didn't build with ToT LLVM and I got a successful build just now. The testing is running. It should finish in 2 hours.
,
Jul 23
Unfortunately the regression is still in ToT. I got 2.4-2.5% this time.
Benchmark: speedometer2; Iterations: 3
keys current (pass:3 fail:0) next (pass:3 fail:0) tot (pass:3 fail:0)
Keys Amean StdDev StdDev/Mean Amean StdDev StdDev/Mean GmeanSpeedup p-value Amean StdDev StdDev/Mean GmeanSpeedup p-value
Total__summary (ms) 7841.30 36.41 0.5% 8044.38 16.18 0.2% -2.5% 0.00 8032.29 42.63 0.5% -2.4% 0.01
,
Jul 23
Just for the record, some helpful suggestions came up in the meeting: (pardon me if there are things I forgot) 1. Check whether it is reproducible using only chrome. I.e., speeding up the bisection by `cros deploy chrome`. 2. Send a lot of tryjobs :) 3. Check whether the CPU profiles and thus the generated codes are distinguishable, so that the bisection can be done without running the benchmarks.
,
Jul 23
For faster bisection, llvm monorepo (single git repo config) can be used : https://llvm.org/docs/GettingStarted.html#id18 Quick instructions (based on my experience) $ git clone https://github.com/llvm-project/llvm-project-20170507/ llvm-monorepo $ mkdir llvm-build && cd llvm-build $ cmake -GNinja ../llvm-monorepo/llvm -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" ...rest of cmake args... $ ninja
,
Jul 24
,
Jul 24
,
Aug 6
It appears that the regression was caused by r332058. We we are in the process of working with the author of that change to track down & fix the issue with it.
,
Aug 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/9e4f94a07f459c027608be002fd9810b3716368d commit 9e4f94a07f459c027608be002fd9810b3716368d Author: Manoj Gupta <manojgupta@google.com> Date: Tue Aug 07 02:54:16 2018 llvm-next: revert r332058. r332058 is causing performance regressions on speedometer2 benchmark. Revert it till we have a fix for the performance regression. Does not impact the current llvm/clang used in Chrome OS. BUG=chromium:864781 TEST=USE="llvm-next" sudo emerge llvm works. Change-Id: Ib37d267efa424c7e5b7d3ac7bc38f972648dfdb3 Reviewed-on: https://chromium-review.googlesource.com/1162753 Commit-Queue: Manoj Gupta <manojgupta@chromium.org> Tested-by: Manoj Gupta <manojgupta@chromium.org> Trybot-Ready: Manoj Gupta <manojgupta@chromium.org> Reviewed-by: Luis Lozano <llozano@chromium.org> [rename] https://crrev.com/9e4f94a07f459c027608be002fd9810b3716368d/sys-devel/llvm/llvm-7.0_pre331547_p20180529-r9.ebuild [add] https://crrev.com/9e4f94a07f459c027608be002fd9810b3716368d/sys-devel/llvm/files/llvm-revert-afdo-hotness.patch
,
Aug 9
,
Aug 24
,
Aug 24
,
Aug 31
OK, managed to get perf stat.
Good image:
Performance counter stats for 'system wide':
772428.807708 cpu-clock (msec) # 2.000 CPUs utilized
913194 context-switches # 0.001 M/sec
10766 cpu-migrations # 0.014 K/sec
2591837 page-faults # 0.003 M/sec
1171135470755 cycles # 1.516 GHz (50.01%)
434978099919 instructions # 0.37 insn per cycle (75.01%)
84068699130 branches # 108.837 M/sec (74.97%)
7085158565 branch-misses # 8.43% of all branches (75.02%)
386.214511715 seconds time elapsed
Bad image:
Performance counter stats for 'system wide':
810060.870342 cpu-clock (msec) # 2.000 CPUs utilized
942924 context-switches # 0.001 M/sec
12090 cpu-migrations # 0.015 K/sec
2610987 page-faults # 0.003 M/sec
1221591877506 cycles # 1.508 GHz (50.01%)
439025903614 instructions # 0.36 insn per cycle (75.00%)
84967296615 branches # 104.890 M/sec (74.98%)
8765602576 branch-misses # 10.32% of all branches (75.01%)
405.030538950 seconds time elapsed
,
Aug 31
Thanks Ting-Yuan to get the data, very helpful. The instruction and branch counts increased in the bad image substantially, which possbily indicates some missing inline in bad. My patch actually tries to increase inline for warm callsites, it is possible that more inlining for warm callsites leads to missing inlining in certain hot callsite because of llvm bottom-up inliner. That is a known issue and currently Easwaran and David are working on priority based inliner which could potentially help on this issue, but it may not be mature enough for the moment to help. perf report can help me to find where the missing inline is. Now we may need to make perf report work as normal. If we can find where the missing inline is, I can provide workaround like bumping up inline threshold for certain module. Thanks, Wei.
,
Aug 31
Sorry I and George will be OOO and won't be able to provide much help next week. We'll continue help as soon as we are back.
,
Sep 3
Assigning to gbiv@ as he will look into perf-report and work with wmi@ for the data / experiments required.
,
Sep 3
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by manojgupta@chromium.org
, Jul 18Owner: cmt...@chromium.org