New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 741695 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

96.2% regression in system_health.common_desktop at 481062:481128

Project Member Reported by pmeenan@chromium.org, Jul 12 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 12 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=741695

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=32e61d392dc881d9345fcfca046e28ace09ad41215ffc7e563f358af4ed54a1d


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-pro
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 12 2017

Cc: borisv@chromium.org
Owner: borisv@chromium.org

=== Auto-CCing suspected CL author borisv@chromium.org ===

Hi borisv@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : borisv
  Commit : 9cea65766cfffee8db51e2f6edb694e0fe10ef93
  Date   : Wed Jun 21 06:38:37 2017
  Subject: Speed up allocateBranchIsland by skipping whole regions of allocated memory.

Bisect Details
  Configuration: mac_pro_perf_bisect
  Benchmark    : system_health.common_desktop
  Metric       : timeToFirstContentfulPaint_avg/load_tools/load_tools_stackoverflow
  Change       : 128.41% | 143.055666665 -> 326.75666666

Revision             Result                  N
chromium@481061      143.056 +- 14.7069      6      good
chromium@481095      135.539 +- 7.0154       6      good
chromium@481112      135.025 +- 6.72794      6      good
chromium@481120      136.018 +- 5.30243      6      good
chromium@481124      135.919 +- 7.72545      6      good
chromium@481125      308.221 +- 142.284      6      bad       <--
chromium@481126      297.697 +- 138.468      6      bad
chromium@481128      326.757 +- 43.5351      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.tools.stackoverflow system_health.common_desktop

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8974258207175756288


For feedback, file a bug with component Speed>Bisection

Comment 4 by borisv@chromium.org, Jul 12 2017

Any clue why we are seeing this only with Stackoverflow? I looked at other changes after this commit and overall noticed a nice 90ms+ improvement in first paint.

Comment 5 by borisv@chromium.org, Jul 12 2017

Also, how did we eliminate the other changes as possible contributors? My change impacts the startup of ALL processes and hence should have impacted ALL startup times.
I am not sure why we are only seeing it with stackoverflow, you can try looking at the traces as described in http://g.co/ChromePerformanceRegressions

As for how we eliminated the other changes, you can look at the bisect results in #3. The bisect bot did a binary search, from 143.056ms at r481061 to 326.757ms at r481128. You can see there is a pretty clear jump from 135.919ms at r481124 to 308.221ms at r481125.

Comment 7 by borisv@chromium.org, Jul 13 2017

Cc: sullivan@chromium.org
Well, it took me a while to figure out what has been happening. Here is the story:
1. If I put a 3 seconds sleep in the entry point of my changes the test is as fast as before (takes ~190ms). If I remove the sleep, the test takes ~300ms. If I comment the whole code (it is at EnableCFBundleBlocker() function), it still takes ~300ms.

This means that my optimization of the process startup time is actually causing more contentions and while speeding up most things, it is slowing down this particular test. I am not sure how to proceed and I would like to hear feedback from the owners of the test. 
Cc: nedngu...@google.com tdres...@chromium.org ksakamoto@chromium.org kouhei@chromium.org
Adding kouhei, ksakamoto, and tdresser since this is a regression in timeToFirstContentfulPaint_avg.

Adding nednguyen since IIUC the CL affects chrome startup, but we're seeing the regression in page load test cases since we start the browser every time.

Comment 9 by kouhei@chromium.org, Jul 14 2017

Would you post traces before/after change?
Results are attached. The file sleep3 contains the code, compiled with a sleep. In both files, look for load_tools@@timeToFirstContentfulPaint_avg.

Steps to generate these files:
1. Get chrome sources
2. Build release version of Chrome for Mac.
3. Run:
$tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=10 --also-run-disabled-tests --story-filter=load.tools.stackoverflow system_health.common_desktop
4. Open the result .json file as printed by the test run, e.g. ~/chromium/src/tools/perf/results-chart.json or save at, as the next run will override it.

5. Open the file chrome/common/mac/cfbundle_blocker.mm and around line 230 (just before the mach_override_ptr), add this line:
sleep(3);

6. Repeat #2-#4. Compare the two results, looking particularly at load_tools@@timeToFirstContentfulPaint_avg
nochange.json
51.4 KB View Download
sleep3.json
51.2 KB View Download
Traces are available in the perf dashboard. 
Before: https://goo.gl/pv5fri
After: https://goo.gl/YDSYsp

I did further investigation. Dropping ~75ms of sleep in the startup of all subprocesses indeed speeds up the test back to its original values. I believe that the test that is failing is the green portion in the traces (at least it matches closely the reported times before and after the regression).
#12: to grab the trace link, you need to right click the trace button & click "Copy Link Address". The trace link in the browser address bar doesn't work because it's generated by cloud storage service ephemerally.
It's a bit puzzling that there's a bigger layout in the second trace.

In the first trace, the longest layout is 72ms, but in the second, it's 147ms.

The CPU duration is also shorter - it's not just an issue of contention.

Based on your description of the issue, it's also surprising that the navigation start occurs later in the second trace. Wouldn't we expect that to happen sooner?
Yes, I saw the layout taking longer, but the change does not impact it. Maybe I am not looking at the right region, but the green at the top shows that it started 105 ms earlier in the "after" set. 
Before:
 Title	Successful Load
 Start	
 918.203 ms
 Wall Duration	
 180.088 ms

After:
 Title	Successful Load
 Start	
 813.157 ms
 Wall Duration	
 385.073 ms

I was looking at the first navigationStart slice, and the first contentful paint metric in the metrics side panel.

I'm not sure what the Successful load slice means.
Well, the metrics do not tell me much - they just confirm this bug. I am looking at NavigationStartToCommit:

1. Navigation to about:blank starts faster after the change (625ms -> 554ms) and is shorter (197ms -> 115ms). This is inline with my change, as both the browser and the renderer have faster startup times.

2. Second navigation (stack overflow) (919, 73) -> (813, 34).

3. There seem to be another navigation handle (to casalemedia.com) that doesn't seem to be measured by the test. That one also starts earlier after my change, but takes longer to complete (1058, 227) -> (960, 298)

I am sure, I am missing something and I am really new to this kind of investigations. Is there a way for the UI to show only the slice that the test actually measures? Is this the green slice that I am looking at, or am I looking at something different? 
Cc: erikc...@chromium.org
(+Erik, who has tons of experience with those).
Cc: benjhayden@chromium.org dproy@chromium.org
Metric visualizations are really bad currently. The loading metrics don't show up in the trace UI.

+dproy is going to do some work on fixing this this quarter.

+benjhayden@, what is the "successful load" slice?
Erik and I looked at the traces. Overall after the change everything starts faster and the total CPU usage seem to decrease. One thing that we noticed as difference is that in the before case we see three separate performLayout calls, but in the after change we see only one. It is possible that the renderer is somehow merging the page loader events impacting the way we measure this metric. 
Another observation is that Tab.WaitForJavaScriptCondition performed by the test also starts earlier and takes less time in the 'after' trace. 

These point to issues with the metric.
In the before case, the first large HTMLDocumentParser::processTokenizedChunkFromBackgroundParser event starts at 1034ms, lasts 17ms.

Then LocalFrameView::performLayout starts at 1069ms, lasts 27ms.
This then happens many more times, with the several more LocalFrameView::performLayouts that last 20s+. The biggest one starts at 1200ms and lasts 72ms.

In the after case, the first large HTMLDocumentParser::processTokenizedChunkFromBackgroundParser event starts at 941ms and lasts 243 ms. This includes a subevent of LocalFrameView::performLayout that lasts 147ms. However, there is no subsequent instance of LocalFrameView::performLayout that lasts for more than 10ms.

We're seeing that the Blink scheduler is choosing to do more processing/layout up front in the latter case. Furthermore, we're measuring different things with "successful load", which seems to measure time until first non-trivial performLayout finishes. In the latter case, the performLayout has done more work. If we had snapshots of the pages, we'd probably see very different results.

The main problem is that the "successful load" metric is highly dependent on how the Blink scheduler chooses to shard work, and is particularly sensitive to the timing of the first chunk of parse/layout. I bet I could tweak some constants in Blink scheduler to "improve" this metric by sharding out a tiny piece of parse/layout. It is not valid to compare "successful loads" across two different traces that have different shardings, since the metric is not comparing apples to apples. 
Screen Shot 2017-07-18 at 4.13.37 PM.png
75.8 KB View Download
Screen Shot 2017-07-18 at 4.18.39 PM.png
79.8 KB View Download
Owner: tdres...@chromium.org
Status: Assigned (was: Untriaged)
Over to tdresser@, metrics guru, to consider how to deal with this.
Cc: skyos...@chromium.org altimin@chromium.org
Status: WontFix (was: Assigned)
It's possible that https://chromium-review.googlesource.com/c/576371/ will fix this issue, depending on when paints occur after these layouts.

If not, I don't think this is a metric issue. If we do fewer layouts / paints, and it does cause content the user may care about to be painted later, that is a regression.

In any case, it seems like at worst, this change may tickle some bad behavior in our scheduling logic. +scheduling folks.

Marking WontFix, but it would be worth scheduling folks taking a look at this case.
Owner: skyos...@chromium.org
Reassigning to skyostril@ to weigh in from scheduling point of view. 
Reassigning to skyostril@ to weigh in from scheduling point of view. 

Sign in to add a comment