New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 814697 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression



Sign in to add a comment

21.4%-25.3% regression in thread_times.key_silk_cases at 537902:537961

Project Member Reported by alexclarke@chromium.org, Feb 22 2018

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Feb 22 2018

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=814697

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=1402c1332bb4673039f97d1f8c15f154282a15d13f1315ecc254740742d8478b


Bot(s) for this bug's original alert(s):

android-webview-nexus5X
android-webview-nexus6
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Feb 23 2018

😿 Pinpoint job stopped with an error.
https://chromeperf.appspot.com/job/14fb8d4f840000
Owner: jamwalla@chromium.org
Status: Assigned (was: Untriaged)
jamwalla, can you take a look? The bisect didn't report back to the bug but if you look at https://pinpoint-dot-chromeperf.appspot.com/job/14fb8d4f840000 you can see a pretty clear regression at r537922, "WebView: enable cc::switches::kCheckDamageEarly". Re-running bisect to double-check.
Cc: changwan@chromium.org boliu@chromium.org
I can reproduce this - the test runs a CSS animation - when the checkDamageEarly flag is on, there are about half as many BeginFrames, but they're longer.

I think frame times are longer with the flag on because we call PrepareTiles twice per frame - once as a scheduled action and once during CommitComplete.

+bo, is this related to the optimization you mentioned in crbug/687695#c36?

Comment 7 by boliu@chromium.org, Mar 21 2018

Components: Mobile>WebView
Labels: OS-Android
you got traces with more events? The only thing I got from the traces on the bot is that previously, renderer compositor wasn't actually producing frames.

Does sound a lot like crbug/687695#c36, yeah. But I don't understand why things got worse
Cc: danakj@chromium.org
Hmm... We should investigate this specific case more, but is there any way we can prevent perf regressions more proactively in the first place?

alexclarke@ / danakj@, any thought?

Comment 9 by danakj@chromium.org, Mar 21 2018

We'd discussed before only doing the PrepareTiles in WillBeginImplFrame if there's no main frame pending, when in low latency mode, to avoid doing it twice per frame if possible.

Comment 10 by boliu@chromium.org, Mar 21 2018

this is as good as it's going to get, getting a bug filed with traces that tell you what regressed, which CL caused the regression, and instructions on how to run the test
Here's the traces I got: locally, this regression is from 1.652 ms with the flag disabled to 2.103 ms with the flag enabled.
flag disabled trace.html
5.4 MB View Download
flag enabled trace.html
5.4 MB View Download
Re #10 the chromium CQ is often pretty slow, plenty of time to run some perf tests?

Comment 13 by boliu@chromium.org, Mar 21 2018

I'm sure the perf team can tell you costs associated with putting perf tests on cq. cq being slow is no excuse to make it slower. But this is the wrong place to have this discussion

Comment 14 by boliu@chromium.org, Mar 21 2018

Your local "flag disabled" trace has the scheduler stuck in high latency mode all the time (BeginMainFrame running in parallel with raster, and commit blocking for a long time). But trace from bot shows it's consistently staying in low latency mode. So not very representative. What device were you using? Maybe the additional tracing is enough overhead to push it to high latency mode? Try a faster device maybe. Or maybe just bad luck.

Anyway, I don't see LayerTreeHostImpl::DrawLayers skipping frames due to draw (when it's super short and doesn't have BenchmarkInstrumentation::ImplThreadRenderingStats underneath it). So I still don't understand what was happening in the trace before, or what could have changed that behavior.

Comment 15 by boliu@chromium.org, Mar 21 2018

Err, not drawing is a red herring. I was comparing different stages of the test to each other.

I think double prepare tiles is also a red herring from being in high latency mode in the local traces. jamwalla trying to confirm that now.

Still no theory on why thread time on so many different threads all regressed..
Cc: perezju@chromium.org
Ok, re #8, #10, #12, I think we should probably run perf trybots more comprehensively for an extensive optimization change like this.

Perf Trybots:
https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_trybots.md

System Health Benchmarks:
https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782#heading=h.fama47i7l06v

Chrome Speed Metrics: Tracking
https://docs.google.com/spreadsheets/d/1gY5hkKPp8RNVqmOw1d-bo-f9EXLqtq4wa3Z7Q8Ek9Tk/edit#gid=0
(There's mention of Thread Times here - not sure what exact benchmark runs this.)

perezju@, is there any perf benchmarks we're missing above but still need to look out for? We previously ran https://bugs.chromium.org/p/chromium/issues/detail?id=687695#c41 .

Comment 17 by boliu@chromium.org, Mar 21 2018

This was not an "extensive optimization change", and this is not a severe regression to block release. And jamwalla did due diligence of running perf trybots (https://chromium.googlesource.com/chromium/src/+/lkcr/docs/speed/perf_trybots.md) before landing. They didn't show anything afaik.

And please a different bug/email thread for that discussion
Cc: vmi...@chromium.org
+vmiura benchmark owner of thread_times.key_silk_cases for input on what that benchmark is measuring and assess the impact of the regression filed.

Comment 19 by boliu@chromium.org, Mar 22 2018

Another unverified theory is that the denominator went down. All of these are "per frame" metrics, so how are frames counted exactly? That CL got rid of a bunch of LayerTreeHostImpl::DrawLayers events. So if frames are counted by number of DrawLayers, then that would explain why so many seemingly unrelated thread times all regressed, and by roughly the same amount.
Looking at https://cs.chromium.org/chromium/src/tools/perf/metrics/timeline.py I think this is a good theory, "per frame" is measured by counting LayerTreeHostImpl::DrawLayers. So we have fewer draws in the beginning of the test, and then the behavior is the same during the actual animation, but now calculating mean cpu time with a fewer number of frames.

+vmiura does this sound right? In that case let's mark this WontFix.

Comment 21 by boliu@chromium.org, Mar 22 2018

Status: WontFix (was: Assigned)
Yeah, this is not a regression at all then.

This removed all the DrawLayers events that previously just early returned immediately due to no damage, but did not remove any other work. So only the artificial denominator moved.
Project Member

Comment 22 by 42576172...@developer.gserviceaccount.com, Mar 29 2018

Cc: kinuko@chromium.org dpranke@chromium.org jbudorick@chromium.org jamwalla@chromium.org
Status: Assigned (was: WontFix)
📍 Found significant differences after each of 3 commits.
https://pinpoint-dot-chromeperf.appspot.com/job/16bdfbd1440000

Remove win-msvc-dbg from the CQ. by dpranke@chromium.org
https://chromium.googlesource.com/chromium/src/+/a00fb2774344bce8fc913e4028a8053baa795739

Move WebKit/common/service_worker mojom files to WebKit/public/mojom by kinuko@chromium.org
https://chromium.googlesource.com/chromium/src/+/589233627aed664191bab4c63dabd2afdee31199

WebView: enable cc::switches::kCheckDamageEarly by jamwalla@chromium.org
https://chromium.googlesource.com/chromium/src/+/05381aa3a9570439a16774f120bc3c0d20c9de81

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Status: WontFix (was: Assigned)
The above is about thread_times.key_silk_cases.thread_other_cpu_time_per_frame, another "per frame" metric. Not a regression, see #19-21.

Sign in to add a comment