New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 698343 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Sep 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 3
Type: Bug-Regression



Sign in to add a comment

Spike in PerformanceMonitor.HighCPU.RendererProcess on Dev channel

Project Member Reported by shrike@chromium.org, Mar 3 2017

Issue description

I received a chirp about a spike in PerformanceMonitor.HighCPU.RendererProcess, occurring after the 58.0.3018.3 release:

https://uma.googleplex.com/timeline_v2?sid=a59f508ff86973009f7cf279f9638871

The same spike does not appear on Windows, which points to a Mac-only regression. Cross-referencing with Canary, I looked at 58.0.3017.0 through 58.0.3018.3:

https://chromium.googlesource.com/chromium/src/+log/58.0.3017.0..58.0.3018.0?pretty=fuller&n=10000

I originally suspected

[ServiceWorker] Mojofy ActivateEvent of Service Worker
https://codereview.chromium.org/2678733002

because of my recent performance work showing that IPC is expensive. Not having an easy way to test my theory, I used Instruments to compare the performance of ServiceWorkerContextTest* and ServiceWorkerJobTest* in content_unittests, running the tests in a single process and continuous loop (e.g. out/component/content_unittests --gtest_repeat=-1 --single-process-tests  --gtest_filter=ServiceWorkerJobTest*).

For ServiceWorkerContextTest over 48s of run time there was an additional 160ms or CPU time with the patch. Time spent in mojo::Connector::ReadSingleMessage() increased by 1s of CPU time (a 192% increase), but there must have been wins elsewhere to reduce the overall regression to just 160ms.

ServiceWorkerJobTest regressed by 110ms in a 55s run, with an extra 470ms occurring in Test::Run(). More analysis could produce a list of specific tests that regressed, but I started looking at the UMA data again.

Splitting the data by specific recent Dev releases produced the second graph:

https://uma.googleplex.com/timeline_v2?sid=0086b53ad8ae3600cd217e631ed34c22

This graph shows each release's contribution to the total shown in the first graph. The three releases show a pattern of rising up to some peak as the release is picked up by Dev channel users, and then dropping back down as users move on to the next release. The middle peak (58.0.3013.3) deviates from this pattern in that it spikes a second time (and in fact 58.0.3004.3 has a small spike as well around the same time). These spikes from older releases combine with the normal spike from 58.0.3018.3 to produce an outsized spike that was picked up by chirp.

At least that's my theory. Leaving this bug open for now to follow up on it in the near future.

 
Screen Shot 2017-03-03 at 11.46.42 AM.png
105 KB View Download
Screen Shot 2017-03-03 at 12.16.14 PM.png
114 KB View Download
I guess the regression in the unit tests happened because of changes in tests. The legacy IPC system does not have good way to sending IPC messages in unit tests using actual sender object, so receiver functions are called directly instead of dispatching messages on the legacy IPC's architecture. Now it's converted to use mojo, and unit tests can actually send the messages via mojo interfaces instead of calling the receiver functions. That would likely be a reason of the regression in the unit tests.

I hope the suspected patch would not be a cause of the regression because it's just converted the ActivateEvent message to use mojo. 


> because of my recent performance work showing that IPC is expensive. 

I know the IPC system would consume some amount of time, but actually I'm not familiar with the number of time and how much different it is between the legacy IPC and mojo IPC. What did you investigate in the recent your work and what was the result?
I believe the conclusion re: mojo IPC is that it is not much more expensive than the traditional IPC. I will cc you on a doc which discusses the relatively large portion of CPU time that goes towards IPC (a combo of the IPC machinery and the number of IPCs that are being sent).
Jayson - can we close this bug? What action do we need before doing so if not?

Comment 4 by shrike@chromium.org, Sep 22 2017

Status: WontFix (was: Assigned)

Sign in to add a comment