Issue metadata
Sign in to add a comment
|
96.2% regression in system_health.common_desktop at 481062:481128 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 12 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8974258207175756288
,
Jul 12 2017
=== Auto-CCing suspected CL author borisv@chromium.org === Hi borisv@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : borisv Commit : 9cea65766cfffee8db51e2f6edb694e0fe10ef93 Date : Wed Jun 21 06:38:37 2017 Subject: Speed up allocateBranchIsland by skipping whole regions of allocated memory. Bisect Details Configuration: mac_pro_perf_bisect Benchmark : system_health.common_desktop Metric : timeToFirstContentfulPaint_avg/load_tools/load_tools_stackoverflow Change : 128.41% | 143.055666665 -> 326.75666666 Revision Result N chromium@481061 143.056 +- 14.7069 6 good chromium@481095 135.539 +- 7.0154 6 good chromium@481112 135.025 +- 6.72794 6 good chromium@481120 136.018 +- 5.30243 6 good chromium@481124 135.919 +- 7.72545 6 good chromium@481125 308.221 +- 142.284 6 bad <-- chromium@481126 297.697 +- 138.468 6 bad chromium@481128 326.757 +- 43.5351 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.tools.stackoverflow system_health.common_desktop More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8974258207175756288 For feedback, file a bug with component Speed>Bisection
,
Jul 12 2017
Any clue why we are seeing this only with Stackoverflow? I looked at other changes after this commit and overall noticed a nice 90ms+ improvement in first paint.
,
Jul 12 2017
Also, how did we eliminate the other changes as possible contributors? My change impacts the startup of ALL processes and hence should have impacted ALL startup times.
,
Jul 12 2017
I am not sure why we are only seeing it with stackoverflow, you can try looking at the traces as described in http://g.co/ChromePerformanceRegressions As for how we eliminated the other changes, you can look at the bisect results in #3. The bisect bot did a binary search, from 143.056ms at r481061 to 326.757ms at r481128. You can see there is a pretty clear jump from 135.919ms at r481124 to 308.221ms at r481125.
,
Jul 13 2017
Well, it took me a while to figure out what has been happening. Here is the story: 1. If I put a 3 seconds sleep in the entry point of my changes the test is as fast as before (takes ~190ms). If I remove the sleep, the test takes ~300ms. If I comment the whole code (it is at EnableCFBundleBlocker() function), it still takes ~300ms. This means that my optimization of the process startup time is actually causing more contentions and while speeding up most things, it is slowing down this particular test. I am not sure how to proceed and I would like to hear feedback from the owners of the test.
,
Jul 13 2017
Adding kouhei, ksakamoto, and tdresser since this is a regression in timeToFirstContentfulPaint_avg. Adding nednguyen since IIUC the CL affects chrome startup, but we're seeing the regression in page load test cases since we start the browser every time.
,
Jul 14 2017
Would you post traces before/after change?
,
Jul 14 2017
Results are attached. The file sleep3 contains the code, compiled with a sleep. In both files, look for load_tools@@timeToFirstContentfulPaint_avg. Steps to generate these files: 1. Get chrome sources 2. Build release version of Chrome for Mac. 3. Run: $tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=10 --also-run-disabled-tests --story-filter=load.tools.stackoverflow system_health.common_desktop 4. Open the result .json file as printed by the test run, e.g. ~/chromium/src/tools/perf/results-chart.json or save at, as the next run will override it. 5. Open the file chrome/common/mac/cfbundle_blocker.mm and around line 230 (just before the mach_override_ptr), add this line: sleep(3); 6. Repeat #2-#4. Compare the two results, looking particularly at load_tools@@timeToFirstContentfulPaint_avg
,
Jul 17 2017
Can you provide traces as per https://www.chromium.org/developers/how-tos/trace-event-profiling-tool/recording-tracing-runs? Thanks!
,
Jul 17 2017
Traces are available in the perf dashboard. Before: https://goo.gl/pv5fri After: https://goo.gl/YDSYsp I did further investigation. Dropping ~75ms of sleep in the startup of all subprocesses indeed speeds up the test back to its original values. I believe that the test that is failing is the green portion in the traces (at least it matches closely the reported times before and after the regression).
,
Jul 18 2017
#12: to grab the trace link, you need to right click the trace button & click "Copy Link Address". The trace link in the browser address bar doesn't work because it's generated by cloud storage service ephemerally.
,
Jul 18 2017
Good point. Here are the links: Before: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2017-06-20_21-11-39-80084.html After: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_35-2017-06-21_02-40-27-29300.html
,
Jul 18 2017
It's a bit puzzling that there's a bigger layout in the second trace. In the first trace, the longest layout is 72ms, but in the second, it's 147ms. The CPU duration is also shorter - it's not just an issue of contention. Based on your description of the issue, it's also surprising that the navigation start occurs later in the second trace. Wouldn't we expect that to happen sooner?
,
Jul 18 2017
Yes, I saw the layout taking longer, but the change does not impact it. Maybe I am not looking at the right region, but the green at the top shows that it started 105 ms earlier in the "after" set. Before: Title Successful Load Start 918.203 ms Wall Duration 180.088 ms After: Title Successful Load Start 813.157 ms Wall Duration 385.073 ms
,
Jul 18 2017
I was looking at the first navigationStart slice, and the first contentful paint metric in the metrics side panel. I'm not sure what the Successful load slice means.
,
Jul 18 2017
Well, the metrics do not tell me much - they just confirm this bug. I am looking at NavigationStartToCommit: 1. Navigation to about:blank starts faster after the change (625ms -> 554ms) and is shorter (197ms -> 115ms). This is inline with my change, as both the browser and the renderer have faster startup times. 2. Second navigation (stack overflow) (919, 73) -> (813, 34). 3. There seem to be another navigation handle (to casalemedia.com) that doesn't seem to be measured by the test. That one also starts earlier after my change, but takes longer to complete (1058, 227) -> (960, 298) I am sure, I am missing something and I am really new to this kind of investigations. Is there a way for the UI to show only the slice that the test actually measures? Is this the green slice that I am looking at, or am I looking at something different?
,
Jul 18 2017
(+Erik, who has tons of experience with those).
,
Jul 18 2017
Metric visualizations are really bad currently. The loading metrics don't show up in the trace UI. +dproy is going to do some work on fixing this this quarter. +benjhayden@, what is the "successful load" slice?
,
Jul 18 2017
Erik and I looked at the traces. Overall after the change everything starts faster and the total CPU usage seem to decrease. One thing that we noticed as difference is that in the before case we see three separate performLayout calls, but in the after change we see only one. It is possible that the renderer is somehow merging the page loader events impacting the way we measure this metric. Another observation is that Tab.WaitForJavaScriptCondition performed by the test also starts earlier and takes less time in the 'after' trace. These point to issues with the metric.
,
Jul 18 2017
In the before case, the first large HTMLDocumentParser::processTokenizedChunkFromBackgroundParser event starts at 1034ms, lasts 17ms. Then LocalFrameView::performLayout starts at 1069ms, lasts 27ms. This then happens many more times, with the several more LocalFrameView::performLayouts that last 20s+. The biggest one starts at 1200ms and lasts 72ms. In the after case, the first large HTMLDocumentParser::processTokenizedChunkFromBackgroundParser event starts at 941ms and lasts 243 ms. This includes a subevent of LocalFrameView::performLayout that lasts 147ms. However, there is no subsequent instance of LocalFrameView::performLayout that lasts for more than 10ms. We're seeing that the Blink scheduler is choosing to do more processing/layout up front in the latter case. Furthermore, we're measuring different things with "successful load", which seems to measure time until first non-trivial performLayout finishes. In the latter case, the performLayout has done more work. If we had snapshots of the pages, we'd probably see very different results. The main problem is that the "successful load" metric is highly dependent on how the Blink scheduler chooses to shard work, and is particularly sensitive to the timing of the first chunk of parse/layout. I bet I could tweak some constants in Blink scheduler to "improve" this metric by sharding out a tiny piece of parse/layout. It is not valid to compare "successful loads" across two different traces that have different shardings, since the metric is not comparing apples to apples.
,
Jul 18 2017
Over to tdresser@, metrics guru, to consider how to deal with this.
,
Jul 19 2017
It's possible that https://chromium-review.googlesource.com/c/576371/ will fix this issue, depending on when paints occur after these layouts. If not, I don't think this is a metric issue. If we do fewer layouts / paints, and it does cause content the user may care about to be painted later, that is a regression. In any case, it seems like at worst, this change may tickle some bad behavior in our scheduling logic. +scheduling folks. Marking WontFix, but it would be worth scheduling folks taking a look at this case.
,
Jul 25 2017
Reassigning to skyostril@ to weigh in from scheduling point of view.
,
Jul 26 2017
Reassigning to skyostril@ to weigh in from scheduling point of view. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 12 2017