New issue
Advanced search Search tips

Issue 835861 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Aug 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocking:
issue 838940



Sign in to add a comment

3.3%-400.2% regression in battor.steady_state at 552586:552591

Project Member Reported by mustaq@chromium.org, Apr 23 2018

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Apr 23 2018

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=835861

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=c9dd5033b18379d6b42babd946a6a6fd357ba30715ed615ff7b6575cc5fbb19c


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
chromium-rel-mac11-pro
chromium-rel-mac12
chromium-rel-win10
chromium-rel-win7-dual
chromium-rel-win7-gpu-intel
chromium-rel-win7-x64-dual
chromium-rel-win8-dual
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Apr 23 2018

Cc: kbr@chromium.org rdevlin....@chromium.org alex...@chromium.org lukasza@chromium.org isherman@chromium.org pastarmovj@chromium.org
Owner: lukasza@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/14cbe751c40000

Make --site-per-process the default on ToT via fieldtrial_testing_config by lukasza@chromium.org
https://chromium.googlesource.com/chromium/src/+/fb1ccf02ee8ca79e1404abfd3a3a7d540b7d2dbd

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Components: Internals>Sandbox>SiteIsolation
Cc: charliea@chromium.org
+charliea@ who owns the battor.steady_state benchmark

I assume that the new behavior with site-per-process is expected and should be used as the new baseline.  OTOH, I am not sure who can make this decision and how.
I think we know about and are investigating some of the regressions (paint metrics, input event latency), but there are also regressions in power (perhaps tied to regressions in average CPU time?), which I don't think we've thought about much before and might want to sanity check.
One interesting thing to note is that the Taobao story exercised in this bug visits https://world.taobao.com/ which doesn't have that many OOPIFs - there are only 2 sites on that webpage: taobao.com and alicdn.com
Cc: -isherman@chromium.org
FWIW, I've tried to gather ETW traces when locally running the "taobao" performance story/test on the 68.0.3416.0 canary (once with --site-per-process cmdline flag and once without the flag and with the trials disabled via chrome://flags).  I've saved the traces at https://drive.google.com/open?id=1zYBrlfQVRp4arUsD7d_J_Rfua8aZi1WB (Google-internal, but I can re-share if needed).

I suspect that with site-per-process a page can schedule work in more than 1 renderer process at a time (which leads to higher CPU utilization).  OTOH, I am not sure how to verify this hypothesis with the gathered ETW traces.

Also - I've just realized that the pinpoint job in #c3 talks about timeToFirstContentfulPaint, so I've scheduled another pinpoint job in #c9, hoping to get a confirmation of a regression in cpu_time_percentage. 
Cc: isherman@chromium.org
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/14c96c33c40000

Make --site-per-process the default on ToT via fieldtrial_testing_config by lukasza@chromium.org
https://chromium.googlesource.com/chromium/src/+/fb1ccf02ee8ca79e1404abfd3a3a7d540b7d2dbd

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Cc: -isherman@chromium.org
Blocking: 838940
I've just realized that the ETW traces in #c10 are gathered for a wrong story - the regression in battor.steady_state / cpu_time_percentage / #c11 happens (as seen in https://chromeperf.appspot.com/group_report?bug_id=835861) mostly in the following stories:
- www.indiatimes.com
- www.uol.com.br
- www.slideshare.net/patrickmeenan

Unfortunately, it turns out that battor.steady_state is a Mac-only benchmark (according to docs/speed/benchmark/harnesses/power_perf.md).  Attempting to run the benchmark (and gather ETW traces) on Windows gives back:
    D:\src\chromium\src>python tools\perf\run_benchmark run battor.steady_state --story-filter http...www.uol.com.br --extra-browser-args=--site-per-process --browser=canary
    ...
    battor.steady_state is disabled on the selected browser

I don't know how to do a CPU sampling profile diff that spans *all* chrome processes... :-(
FWIW, sadrul@'s LatencyInfo fix (r554321) doesn't seem to impact/recover the regresson shown for battor.steady_state / cpu_time_percentage_avg at https://chromeperf.appspot.com/group_report?bug_id=835861 - the chart shows a bump at r552589, but is more-or-less flat at r554321.

Comment 16 by kbr@chromium.org, May 2 2018

Cc: -kbr@chromium.org
Lukasz, you can try to run the story on Windows using: --also-run-disabled-tests. I'm not sure if it will work, but it's worth a try and, given the fact that the same thing that's affecting Mac seems to be affecting Windows, it seems likely that problems you find using ETW on Windows will carry over to Mac.
I've tried manually opening the websites on Windows and gathering ETW traces, but unfortunately nothing actionable or surprising jumped out at me :-(

I wonder if it is possible to compare the benchmark results with some UMA data.  I don't see good UMA metrics to compare against, but let me ask around.  So far I've found:
- Power.BatteryDischargeRate (CrOS-only) - doesn't change much
- PerformanceMonitor.*CPU.(Browser|Renderer)Process (Windows-only) - difficult to draw overall conclusions
Unfortunately, other than manually digging through the two ETW traces and perhaps looking at some code paths that you might expect to be stressed more with site isolation, I don't have any great suggestions on how to diff the two traces.

There definitely does definitely seem to be a ~3% power regression though.
Note that some discussion of these results is also happening in go/site-isolation-performance#heading=h.u3cs4xppn2j2 (google-internal link, sorry...)
lukasza@: any update from site-isolation-performance discussion?
Status: WontFix (was: Assigned)
RE: #c21: mustaq@

I don't think there are any actionable conclusions - maybe we just need to accept the new power consumption level as the new normal / as necessary tax for the better security guarantees offered by Site Isolation.  

I've also tried to look at more complete data we have from having Site Isolation on the stable channel in M67 and M68 (I've added it to go/site-isolation-performance#heading=h.u3cs4xppn2j2).  Interestingly, M67 data shows no statistically significant regression (which might support resolving this bug as WontFix) although M68 data shows a regression of about 1.92% at the 50th percentile (this regression seems like a separate bug, because it happened independently from and after we've turned on Site Isolation).

Given the above, I'd be leaning toward resolving this bug as WontFix.  If you think there are some remaining actions we should be taking here, please reactivate.
To follow-up, no statistically significant changes have been found in M69 in Power.BatteryDischargeRate - go/site-isolation-performance#heading=h.u3cs4xppn2j2

Sign in to add a comment