Issue metadata
Sign in to add a comment
|
12.8%-68.6% regression in loading.desktop at 515363:515504 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Nov 13 2017
๐ Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1699cc89f80000
,
Nov 14 2017
๐ Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/1699cc89f80000
,
Nov 14 2017
๐ Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14fa8029f80000
,
Nov 14 2017
๐ Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/14fa8029f80000 Re-land "Switch Mac to use time_exploded_posix.cc." By asvitkine@chromium.org ยท Fri Nov 10 00:38:46 2017 chromium @ af2ab97d5f8900c802fe6a20e726c4f4b8392228 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Nov 14 2017
Interesting. If this is real, it means the POSIX time implementations take more CPU time (but less wall clock time per UMA Sampling Profiler data).
,
Nov 14 2017
Here's how things look in UMA: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=553b3654854c818c1ef15ea25b84e1b9 So hard to tell if there was any effect. There's some day-of-week fluctuations but it does look like like it might have gotten a tad better.
,
Nov 14 2017
We see this on chromium-rel-mac11-pro but not chromium-rel-mac11-air or chromium-rel-mac12. https://chromeperf.appspot.com/report?sid=37d9412197cc7ec5b38abda279575e65f34131c73b4ba965b28cc5b7fe7d9596&start_rev=511046&end_rev=516267 Which is very strange! Seems it doesn't affect 10.12 or 10.11 laptop - only 10.11 desktop? I don't know why that would be. But if that's indeed the case, I think this is reasonable to WontFix given the change improves wall clock time (see https://bugs.chromium.org/p/chromium/issues/detail?id=781601#c22) and only increases CPU time on a desktop machine (where battery isn't an issue) on 10.11 only. +shrike +sullivan for thoughts
,
Nov 14 2017
Although while looking on TheVerge data, we do see what looks like a regression on all three Mac configs: https://chromeperf.appspot.com/report?sid=5fd20246ed7d4bfd2daa8151cc9eb2309aa397e0b1903b4376522c02ef4cea13&start_rev=511046&end_rev=516267
,
Nov 16 2017
Haven't had time to to try to repro this locally yet. Would still like to. Some other thoughts: - Previous implementation had long wall clock time and occupied UI thread a lot (as a result of DataUse measurements). - Is it possible that with that duration reduced, it lets the UI thread get more work done - e.g. could there be more FPS or repaint events for the page with the change? Do we have a way to tell?
,
Nov 16 2017
Speculating: This behavior seems to indicate that the old way may have been calling into a platform library (internally, in Apple's libraries) and acquiring a lock that is under contention. This would explain why a call took longer to run (blocking the UI thread) but the CPU usage was lower (if it was idle while waiting on the lock).
,
Nov 22 2017
Was able to run this locally via: tools/perf/run_benchmark loading.desktop --browser=exact --browser-executable=/Users/asvitkine/chromium/src/out/gn-rel/Chromium.app/Contents/MacOS/Chromium --story-filter=TheVerge However, in particular TheVerge story seems to be very flaky. It defaults to doing 4 runs - but even between those runs it seems that the website ends up working differently. Sometimes it plays a video and other times it doesn't. So I see variation from 20ms to 60ms of run time of the test between those two runs. So not sure I can make much of this using local testing. I can try one of the other stories that showed a regression tomorrow. (My hypothesis is still that the improved runtime of the functions - see https://bugs.chromium.org/p/chromium/issues/detail?id=781601#c25 - are freeing up the main thread allowing more work to be done. But I need to confirm this.)
,
Nov 23 2017
Tried the Vnexpress story. Ran 10x (20 tests) with a build with and without the change and don't see a different in cpu time. tools/perf/run_benchmark loading.desktop --browser=exact --browser-executable=/Users/asvitkine/chromium/src/out/gn-rel/Chromium.app/Contents/MacOS/Chromium --story-filter=Vnexpress --pageset-repeat=10 Here's a screenshot of the two runs. Run on the left is the one with the old implementation. Numbers are actually a bit worse. At least this case didn't suffer from the nondeterminism of TheVerge one. Although the content also doesn't look like what Vnexpress looks like. (The content in the benchmark seems to be dominated by a full screen screenshot and some ads - vs. a bunch of stories on the live site.) Will try goo.ne.jp next - which is actually what Pinpoint found during bisect (the other ones were just lumped with it.) I guess I should have started with it.
,
Nov 23 2017
+kouhei +ksakamoto Adding loading.desktop benchmark owners re: the issues I've mentioned above. Also +nednguyen fyi.
,
Nov 23 2017
goog.ne.jp does seem to repro reliably locally - will use it to investigate. tools/perf/run_benchmark loading.desktop --browser=exact --browser-executable=/Users/asvitkine/chromium/src/out/gn-rel/Chromium.app/Contents/MacOS/Chromium --story-filter=goo.ne.jp
,
Nov 23 2017
Still not much luck in getting to the cause of cpu time change. However, I did notice that timeToOnLoad actually decreased significantly at the same time (on goo.ne.jp which is one of the affected pages): https://chromeperf.appspot.com/report?sid=92937ba51ad4ed6f4a6b3a391fdf36909e3bd23c4374d0c19c5e83a757350aa6&start_rev=511046&end_rev=519003
,
Nov 23 2017
So here's my interpretation of the results: 1. My changes speeds wall clock time of the time functions (see crbug.com/781601 ). 2. This allows some tasks to run sooner than they would before. For example, some tasks necessary for OnLoad() to happen to run while other threads are still working on FCP. 3. Hence, OnLoad() complete sooner, FCP wall time is unaffected but FCP cpu time is increased. The extra CPU time is work that would have happened anyway - but later - and is on the critical path to OnLoad() - but not to FCP. So, per the above, I'm marking as WontFix. Note: This tooling continues to leave a lot to be desired. :( I was still not able to get meaningful cpu profiles from these benchmarks despite being able to repro their results locally. Happy to share all the issues I've ran into if there's interest. Let me know.
,
Nov 24 2017
Thanks for the in-depth analysis here. I'm convinced. +benhenry@, as this is an example of an interesting tradeoff. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Nov 13 2017